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Preface 



This volume contains the 41 contributed papers and three invited papers pre- 
sented at the 9th Annual European Symposium on Algorithms, held in Arhus, 
Denmark, 28-31 August 2001. ESA 2001 continued the sequence: 



— 1993 Bad Honnef (Germany) 

~ 1994 Utrecht (The Netherlands) 

— 1995 Corfu (Greece) 

~ 1996 Barcelona (Spain) 



— 1997 Graz (Austria) 

— 1998 Venice (Italy) 

— 1999 Prague (Czech Republic) and 

— 2000 Saarbriicken (Germany). 



The proceedings of these previous meetings were published as Springer LNCS 
volumes 726, 855, 979, 1136, 1284, 1461, 1643, and 1879. 

Papers were solicited in all areas of algorithmic research, including appro- 
ximations algorithms, combinatorial optimization, computational biology, com- 
putational geometry, databases and information retrieval, external-memory al- 
gorithms, graph and network algorithms, machine learning, online algorithms, 
parallel and distributed computing, pattern matching and data compression, 
randomized algorithms, and symbolic computation. Algorithms could be sequen- 
tial, distributed, or parallel, and should be analyzed either mathematically or 
by rigorous computational experiments. Experimental and applied research were 
especially encouraged. 

Each extended abstract submitted was read by at least three referees, and 
evaluated on its quality, originality, and relevance to the symposium. The entire 
Program Committee met at Paderborn University on 12-13 May 2001 and sel- 
ected 41 papers for presentation from the 102 submissions. These, together with 
three invited papers by Susanne Albers, Lars Arge, and Uri Zwick, are included 
in this volume. 

The Program Committee consisted of: 



Friedhelm Meyer auf der Heide 
(Paderborn; Chair) 
Micah Adler (Amherst) 

Pankaj Kumar Agarwal (Duke) 
Mark de Berg (Utrecht) 

Gerth Stplting Brodal (Arhus) 
Tom Cormen (Dartmouth) 
Martin Dyer (Leeds) 

Stefano Leonardi (Rome) 



Peter Bro Miltersen (Arhus) 
Ian Munro (Waterloo) 

Petra Mutzel (Wien) 

Stefan Naher (Trier) 

Yuval Rabani (Technion) 

Jorg Rudiger Sack (Carleton) 
Alistair Sinclair (Berkeley) 
Dorothea Wagner (Konstanz) 



ESA 2001 was held as a combined conference (ALGO 2001) together with the 
Workshop on Algorithmic Engineering (WAE 2001) and the Workshop on Algo- 
rithms in Bioinformatics (WABI 2001), and it was preceded by the Workshop 




VI 



Preface 



on Approximation and Randomized Algorithms in Communication Networks 
(ARACNE 2001). 

The seven distinguished invited speakers of ALGO 2001 were: 

— Susanne Albers (Universitat Freiburg), 

— Lars Arge (Duke University), 

— Andrei Broder (AltaVista), 

— Herbert Edelsbrunner (Duke University), 

— Jotun Hein (University of Arhus), 

— Gene Myers (Celera Genomics), and 

~ Uri Zwick (Tel Aviv University). 

The Organizing Committee of both ALGO 2001 and ESA 2001, consisted 
of Gerth Stplting Brodal, Rolf Fagerberg, Karen Kjaer Mpller, Erik Meineche 
Schmidt, and Christian Nprgaard Storm Pedersen, all from BRIGS, University 
of Arhus. I am grateful to Gerth Stplting Brodal (BRIGS), Tanja Burger, and 
Rolf Wanka (Paderborn University) for their support of the program commit- 
tee work and the preparation of the proceedings. ESA 2001 was sponsored by 
the European Association for Theoretical Computer Science (EATCS), and by 
BRIGS. We thank ACM Sigact for providing us with the software used for 
handling the the electronic submissions. 



Paderborn, June 2001 



Friedhelm Meyer auf der Heide 
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External Memory Data Structures 

(Invited Paper) 



Lars Arge* 

Department of Computer Science, Duke University, Durham, NC 27708, USA 



Abstract. Many modern applications store and process datasets much 
larger than the main memory of even state-of-the-art high-end machines. 
Thus massive and dynamically changing datasets often need to be stored 
in data structures on external storage devices, and in such cases the 
Input/Output (or I/O) communication between internal and external 
memory can become a major performance bottleneck. In this paper we 
survey recent advances in the development of worst-case I/O-efBcient 
external memory data structures. 



1 Introduction 

Many modern applications store and process datasets much larger than the main 
memory of even state-of-the-art high-end machines. Thus massive and dynami- 
cally changing datasets often need to be stored in space efficient data structures 
on external storage devices such as disks, and in such cases the Input/Output 
(or I/O) communication between internal and external memory can become a 
major performance bottleneck. Many massive dataset applications involve ge- 
ometric data (for example, points, lines, and polygons) or data which can be 
interpreted geometrically. Such applications often perform queries which corre- 
spond to searching in massive multidimensional geometric databases for objects 
that satisfy certain spatial constraints. Typical queries include reporting the ob- 
jects intersecting a query region, reporting the objects containing a query point, 
and reporting objects near a query point. 

While development of practically efficient (and ideally also multi-purpose) 
external memory data structures (or indexes) has always been a main concern 
in the database community, most data structure research in the algorithms com- 
munity has focused on worst-case efficient internal memory data structures. Re- 
cently, however, there has been some cross-fertilization between the two areas. In 
this paper we survey recent advances in the development of worst-case efficient 
external memory data structures. We will concentrate on data structures for 
geometric problems — especially the important one- and two-dimensional range 
searching problems — but mention other structures when appropriate. A more 
comprehensive discussion can be found in a recent survey by the author [16]. 

* Supported in part by the National Science Foundation through ESS grant EIA- 
9870734, RI grant EIA-9972879, and CAREER grant EIA-9984099. 



F. Meyer auf der Heide (Ed.): ESA 2001, LNCS 2161, pp. 1-29, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 
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Model of computation. Accurately modeling memory and disk systems is a 
complex task [131]. The primary feature of disks we want to model is their ex- 
tremely long access time relative to that of internal memory. In order to amortize 
the access time over a large amount of data, typical disks read or write large 
blocks of contiguous data at once and therefore the standard two-level disk model 
has the following parameters [13,153,104]: 

N = number of objects in the problem instance; 

T = number of objects in the problem solution; 

M = number of objects that can fit into internal memory; 

B = number of objects per disk block; 

where B < M < N . An I/O operation (or simply I/O) is the operation of reading 
(or writing) a block from (or into) disk. Refer to Figure 1. Computation can only 
be performed on objects in internal memory. The measures of performance in 
this model are the number of I/Os used to solve a problem, as well as the amount 
of space (disk blocks) used and the internal memory computation time. 

Several authors have considered more accurate and complex multi-level mem- 
ory models than the two-level model. An increasingly popular approach to in- 
crease the performance of I/O systems is to use several disks in parallel so work 
has especially been done in multi disk models. See e.g. the recent survey by 
Vitter [151]. We will concentrate on the two- level one-disk model, since the data 
structures and data structure design techniques developed in this model often 
work well in more complex models. For brevity we will also ignore internal com- 
putation time. 

Outline of paper. The rest of this paper is organized as follows. In Section 2 
we discuss the B-tree, the most fundamental (one-dimensional) external data 
structure, as well as recent variants and extensions of the structure, and in 
Section 3 we discuss the so-called buffer trees. In Section 4 we illustrate some 
of the important techniques and ideas used in the development of provably I/O- 
efficient data structures for higher-dimensional problems through a discussion 
of the external priority search tree for 3-sided planar range searching. We also 
discuss a general method for obtaining a dynamic data structure from a static 




Fig. 1. Disk model; An I/O moves B contiguous elements between disk and main 
memory (of size M). 
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one. In Section 5 we discuss data structures for general (4-sided) planar range 
searching, and in Section 6 we survey results on external data structures for 
interval management, point location, range counting, higher-dimensional range 
searching, halfspace range searching, range searching among moving points, and 
proximity queries. Several of the worst-case efficient structures we consider are 
simple enough to be of practical interest. Still, there are many good reasons 
for developing simpler (heuristic) and general purpose structures without worst- 
case performance guarantees, and a large number of such structures have been 
developed in the database community. Even though the focus of this paper is on 
provably worst-case efficient data structures, in Section 7 we give a short survey 
of some of the major classes of such heuristic-based structures. The reader is 
referred to recent surveys for a more complete discussion [12,85,121]. Finally, in 
Section 8 we discuss some of the efforts which have been made to implement the 
developed worst-case efficient structures. 

2 B-Trees 

The B-tree is the most fundamental external memory data structure, corre- 
sponding to an internal memory balanced search tree [35,63,104,95]. It uses 
linear space — 0{N/B) disk blocks — and supports insertions and deletions in 
0{logg N) I/Os. One-dimensional range queries, asking for all elements in the 
tree in a query interval [gi, < 72 ], can be answered in 0(log3 N + T/B) I/Os. 

The space, update, and query bounds obtained by the B-tree are the bounds 
we would like to obtain in general for more complicated problems. The bounds 
are significantly better than the bounds we would obtain if we just used an 
internal memory data structure and virtual memory. The 0{N / B) space bound 
is obviously optimal and the Oilogg N + T/B) query bound is optimal in a 
comparison model of computation. Note that the query bound consists of an 
0{logg N) search-term corresponding to the familiar O(logfV) internal memory 
search-term, and an 0{T/B) reporting-term accounting for the 0{T/B) I/Os 
needed to report T elements. Recently, the above bounds have been obtained 
for a number of problems (e.g [30,26,149,5,47,87]) but higher lower bounds have 
also been established for some problems [141,26,93,101,106,135,102]. We discuss 
these results in later sections. 

B-trees come in several variants, like B+ and B* trees (see e.g. [35,63,95,30, 
104,3] and their references). A basic B-tree is a 0{B)-aiy tree (with the root 
possibly having smaller degree) built on top of 0{N/B) leaves. The degree of 
internal nodes, as well as the number of elements in a leaf, is typically kept in 
the range [B /2 . . . B] such that a node or leaf can be stored in one disk block. 
All leaves are on the same level and the tree has height 0(log5 N ) — refer to 
Figure 2. In the most popular B-tree variants, the N data elements are stored 
in the leaves (in sorted order) and each internal node holds 0{B) “routing” (or 
“splitting”) elements used to guide searches. 

To answer a range query [gi , < 72 ] on a B-tree we first search down the tree 
for < 7 i and (72 using 0(log^ IV) I/Os, and then we report the elements in the 
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Fig. 2. B-tree; All internal nodes (except possibly the root) have fan-out 0{B) and 
there are 0{N/B) leaves. The tree has height 0{logg N). 



0{T/B) leaves between the leaves containing q\ and q 2 - We perform an insertion 
in 0(logg N) I/Os by first searching down the tree for the relevant leaf 1. If there 
is room for the new element in I we simply store it there. If not, we split I into two 
leaves I' and V of approximately the same size and insert the new element in the 
relevant leaf. The split of I results in the insertion of a new routing element in the 
parent of I, and thus the need for a split may propagate up the tree. Propagation 
of splits can often be avoided by sharing some of the (routing) elements of the 
full node with a non-full sibling. A new (degree 2) root is produced when the 
root splits and the height of the tree grows by one. Similarly, we can perform 
a deletion in 0(log3 N) I/Os by first searching for the relevant leaf I and then 
removing the deleted element. If this results in I containing too few elements we 
either fuse it with one of its siblings (corresponding to deleting I and inserting 
its elements in the sibling), or we perform a share operation by moving elements 
from a sibling to 1. As splits, fuse operations may propagate up the tree and 
eventually result in the height of the tree decreasing by one. 

B-tree variants and extensions. Recently, several important variants and 
extensions of B-trees have been considered. 

Arge and Vitter [30] developed the weight-balanced B-trees, which can be 
viewed as an external version of BB[a] trees [120]. Weight-balanced B-trees are 
very similar to B-trees but with a weight constraint imposed on each node in 
the tree instead of a degree constraint. The weight of a node v is defined as 
the number of elements in the leaves of the subtree rooted in v. Like B-trees, 
weight-balanced B-trees are rebalanced using split and fuse operations, and a 
key property of weight-balanced B-trees is that after performing a rebalance 
operation on a weight 0{w) node v, fi{w) updates have to be performed below 

V before another rebalance operation needs to be performed on v. This means 
that an update can still be performed in 0{\oggN) I/Os (amortized) even if 

V has a large associated secondary structure that needs to be updated when a 
rebalance operation is performed on v, provided that the secondary structure can 
be updated in 0{w) I/Os. Weight-balanced B-trees have been used in numerous 
efficient data structure (see e.g. [30,26,89,90,38,3,28]). 

In some applications we need to be able to traverse a path in a B-tree from a 
leaf to the root. To do so we need a parent-pointer from each node to its parent. 
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Such pointers can easily be maintained efficiently in normal B-trees or weight- 
balanced B-trees, but cannot be maintained efficiently if we also want to support 
divide and merge operations. A divide operation at element x constructs two 
trees containing all elements less than and greater than x, respectively. A merge 
operation performs the inverse operation. Without parent pointers, B-trees and 
weight-balanced B-trees supports the two operations in 0(log^ fV) I/Os. Agar- 
wal et al. [3] developed the level-balance B-trees in which divide and merge 
operations can be supported I/O-efficiently while maintaining parent pointers. 
In level-balanced B-trees a global balance condition is used instead of the lo- 
cal degree or weight conditions used in B-trees or weight-balanced B-trees; a 
constraint is imposed on the number of nodes on each level of the tree. When 
the constraint is violated the whole subtree at that level and above is rebuilt. 
Level-balanced B-trees e.g. have applications in dynamic maintenance of planar 
st-graphs [3]. 

Partial persistent B-trees, that is, B-trees where each update results in a 
new version of the structure, and where both the current and older versions can 
be queried (in the database community often called multiversion B-trees), are 
useful not only in database applications where previous versions of the database 
needs to be stored and queried, but (as we will discuss in Section 4) also in the 
solution of many geometric problems. Using standard persistent techniques [137, 
70], a persistent B-tree can be designed such that updates can be performed 
in 0{log^ N) I/Os and such that any version of the tree can be queried in 
0{logg N -\-T/B) I/Os. Here N is the number of updates and the tree uses 
o\n/B) disk blocks [36,147]. 

In string applications a data element (string of characters) can often be arbi- 
trarily long or different elements can be of different length. Such elements cannot 
be manipulated efficiently in standard B-trees, which assumes that elements (and 
thus routing elements) are of unit size. Ferragina and Grossi developed the el- 
egant string B-tree where a query string q is routed through a node using a 
so-called blind trie data structure [78]. A blind trie is a variant of the compacted 
trie [104,117], which fits in one disk block. In this way a query can be answered 
in 0(logg A^ -I- \q\/B) I/O. See [64,80,77,20] for other results on string B-trees 
and external string processing. 



3 Buffer Trees 



In internal memory, an N element search tree can be constructed in optimal 
0{N log N) time simply by inserting the elements one by one. This construction 
algorithm can also be used as on optimal sorting algorithm. In external memory, 
we would use 0{N log^ N) I/Os to build a B-tree using the same method. In- 
terestingly, this is not optimal since Aggarwal and Vitter showed that sorting N 
elements in external memory takes 0{^logj^/g ^) I/Os [13]. We can of course 
build a B-tree in the same bound by first sorting the elements and then building 
the tree level- by- level bottom-up. 
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In order to obtain an optimal sorting algorithm based on a search tree struc- 
ture, we would need a structure that supports updates in 0(;g log^/e I/Os. 
The inefficiency of the B-tree sorting algorithm is a consequence of the B-tree 
being designed to be used in an “on-line” setting where queries should be an- 
swered immediately — updates and queries are handled on an individual basis. 
This way we are not able to take full advantage of the large internal memory. 
It turns out that in an “off-line” environment where we are only interested in 
the overall I/O use of a series of operations and where we are willing to relax 
the demands on the query operations, we can develop data structures on which 
a series of N operations can be performed in 0{^ log^/^ I/Os in total. To 
do so we use the buffer tree technique developed by Arge [14]. 

Basically the buffer tree is just a fan-out 0{M/B) B-tree where each internal 
node has a buffer of size 0{M). The tree has height 0{logj^/g ^); refer to 
Figure 3. Operations are performed in a “lazy” manner: In order to perform an 
insertion we do not (like in a normal B-tree) search all the way down the tree for 
the relevant leaf. Instead, we wait until we have collected a block of insertions 
and then we insert this block in the buffer of the root (which is stored on disk). 
When a buffer “runs full” its elements are “pushed” one level down to buffers 
on the next level. We can do so in 0{M/B) I/Os since the elements in the 
buffer fit in main memory and the fan-out of the tree is 0{M/B). If the buffer 
of any of the nodes on the next level becomes full by this process, the buffer- 
emptying process is applied recursively. Since we push 0{M) elements one level 
down the tree using 0{M/B) I/Os (that is, we use 0(1) I/Os to push one block 
one level down), we can argue that every block of elements is touched a constant 
number of times on each of the levels of the tree. Thus, not counting rebalancing, 
inserting N elements requires 0{% log^/^ ^) I/Os in total, or 0(j^ ^) 

amortized. Arge showed that rebalancing can be handled in the same bound [14]. 

The basic buffer tree supporting insertions only can be used to construct a B- 
trees in 0(^ logjvf/s ^) I/Os (without explicitly sorting). This of course means 
that buffer trees can be used to design an optimal sorting algorithm. Note that 
unlike other sorting algorithm, the N elements to be sorted do not all need to 
be given at the start of this algorithm. Deletions and (one-dimensional) range 




Fig. 3. Buffer tree; Fan-out M/B tree where each node has a buffer of size M. Oper- 
ations are performed in a lazy way using the buffers. 
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queries can also be supported I/O-efficiently using buffers [14]. The range queries 
are hatched in the sense that we do not obtain the result of a query immediately. 
Instead parts of the result will be reported at different times as the query is 
pushed down the tree. This means that the data structure can only be used 
in algorithms where future updates and queries do not depend on the result of 
the queries. Luckily this is the case in many plane-sweep algorithms [73,14]. In 
general, problems where the entire sequence of updates and queries is known in 
advance, and the only requirement on the queries is that they must all eventually 
be answered, are known as batched dynamic problems [73]. 

As mentioned, persistent B-trees are often used in geometric data structures. 
Often, a data structure is constructed by performing N insertion and deletions 
on an initially empty persistent B-tree, and then the resulting (static) structure 
is used to answer queries. Using the standard update algorithms the construction 
would take 0{N logg N) I/Os. A straightforward application of the buffer tree 
technique improves this to the optimal 0(§logy^/5 I/Os [145,21] (another 
optimal, but not linear space, algorithm can be designed using the distribution- 
sweeping technique [86]). Several other data structures can be constructed ef- 
ficiently using buffers, and the buffer tree technique has been used to develop 
several other data structures which in turn have been used to develop algorithms 
in many different areas [25,29,20,21,107,15,76,46,144,145,96,44,136]. 

Priority queues. External buffered priority queues have been extensively re- 
searched because of their applications in graph algorithms. Arge showed how to 
perform deletemin operations on a basic buffer tree in amortized 0{j^ ^ogj^^g ^) 
I/Os [14]. Note that in this case the deletemin occurs right away, that is, it is 
not batched. This is accomplished by periodically computing the 0{M) small- 
est elements in the structure and storing them in internal memory. Kumar and 
Schwabe [107] and Fadel et al. [76] developed similar buffered heaps. Using a 
partial rebuilding idea, Brodal and Katajainen [45] developed a worst-case effi- 
cient external priority queue. Using the buffer tree technique on a tournament 
tree, Kumar and Schwabe [107] developed a priority queue supporting update 
operations in 0(g log ^) I/Os. They also showed how to use their structure in 
several efficient external graph algorithms (see e.g [2,7,18,22,27,46,59,81,97,107, 
110,111,116,118,122,142,156] for other results on external graph algorithms and 
data structures). Note that if the priority of an element is known, an update 
operation can be performed in 0(glogM/B I/Os on a buffer tree using a 
delete and an insert operation. 



4 3-Sided Planar Range Searching 

In internal memory many elegant data structures have been developed for higher- 
dimensional problems like range searching — see e.g. the recent survey by Agarwal 
and Erickson [12]. Unfortunately, most of these structures are not efficient when 
mapped directly to external memory — mainly because they are normally based 
on binary trees. The main challenge when developing efficient external structures 
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is to use B-trees as base structures, that is, to use multiway trees instead of binary 
trees. Recently, some progress has been made in the development of provably 
I/O-efficient data structures based on multi-way trees. In this section we consider 
a special case of two-dimensional range searching, namely the 3-sided planar 
range searching problem: Given a set of points in the plane, the solution to a 
3-sided query (<7i, 92 , 93 ) consists of all points {x, y) with qi < x < q 2 and y > qs- 
The solution to this problem is not only an important component of the solution 
to the general planar range searching problem we discuss in Section 5, but it 
also illustrate many of the techniques and ideas utilized in the development of 
other external data structures. 

The static version of the 3-sided problem where the points are fixed can easily 
be solved I/O-efiiciently using a sweeping idea and a persistent B-tree; consider 
sweeping the plane with a horizontal line from y = 00 to y = —00 and inserting 
the x-coordinate of points in a persistent B-tree as they are met. To answer a 
query (<7i, 92 , 93 ) we simply perform a one-dimensional range query [ 91 ,( 72 ] on 
the version of the persistent B-tree we had when the sweep- line was at y = ( 73 . 
Following the discussion in Section 2, the structure obtained this way uses linear 
space and queries can be answered in 0(log^ N -\-T/B) I/Os. The structure can 
be constructed in 0{^log]^/g I/Os using the buffer tree technique. 

From the static solution we can obtain a linear space dynamic structure which 
answers a query in 0(log^ iV -|- T/B) I/Os and can be updated in 0(log%N) 
I/Os using an external version of the logarithmic method for transforming a 
static structure into a dynamic structure [39,125]. This technique was developed 
by Arge and Vahrenhold as part of the design of a dynamic external planar point 
location structure (See Section 6). Due to its general interest, we describe the 
technique in Section 4.1 below. An optimal O(log^iV) query structure can be 
obtained in a completely different way, and we discuss this result in Section 4.2. 



4.1 The Logarithmic Method 

In internal memory, the main idea in the logarithmic method is to partition 
the set of N elements into log IV subsets of exponentially increasing size 2*, i = 
0, 1, 2, . . . , and build a static structure T>i for each of these subsets. Queries are 
then performed by querying each T>i and combining the answers, while insertions 
are performed by finding the first empty T>i, discarding all structures T>j, j < i, 
and building T>i from the new element and the X];=o 2* = 2* — 1 elements in the 
discarded structures. 

To make the logarithmic method I/O-efficient we need to decrease the number 
of subsets to log^ N, which in turn means increasing the size of T>i to R* . When 
doing so T>j, j < i, does not contain enough objects to build T>i (since 1 -I- 
< B*). However, it turns out that if we can build a static structure 
I/O-efficiently enough, this problem can be resolved and we can make a modified 
version of the method work in external memory. 

Consider a static structure T> that can be constructed in 0(^ log^ N) I/Os 
and that answers queries in 0{logg N) I/Os (note that 0{^logj^f = 
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Fig. 4. Logarithmic method; log^ N structures — T>i contains less than + 1 elements. 
T>i,T> 2, . . . ,T>j do not contain enough elements to build T>j+i of size . 



0{^logQ N) if M > B'^). We partition the N elements into log^ TV sets such 
that the zth set has size less than + 1 and construct an external memory static 
data structure T>i for each set — refer to Figure 4. To answer a query, we simply 
query each T>i and combine the results using ^ logs l^il) = 0{log% N) 

I/Os. We perform an insertion by finding the first structure T>i such that 
EUm I < B^, discarding all structures T>j, j < i, and building a new Di from 
the elements in these structures using 0{{B'‘ / B)logg B^) = 0{B'^~^ logg N) 
I/Os. Now because of the way T>i was chosen, we know that l^il > B'^~^. 

This means that at least B^~^ objects are moved from lower indexed structures 
to T>i. If we divide the T>i construction cost between these objects, each object is 
charged 0(log3 N) I/Os. Since an object never moves to a lower indexed struc- 
ture we can at most charge it 0(log^ N) times during N insertions. Thus the 
amortized cost of an insertion is 0{log^Q N) I/Os. Note that the key to making 
the method work is that the factor of B we lose when charging the construction 
of a structure of size B^ to only objects is offset by the 1/B factor in the 
construction bound. Deletions can also be handled I/O-efficiently using a global 
rebuilding idea. 



4.2 Optimal Dynamic Structure 

Following several earlier attempts [101,127,141,43,98], Arge et al. [26] developed 
an optimal dynamic structure for the 3-sided planar range searching problem. 
The structure is an external version of the internal memory priority search tree 
structure [113]. The external priority search tree consists of a base B-tree on the 
x-coordinates of the N points. A range Xy (containing all points below v) can be 
associated with each node u in a natural way. This range is subdivided into 0{B) 
subranges associated with the children of v. For illustrative purposes we call the 
subranges slabs. In each node v we store 0(B) points for each of v’s 0(B) children 
Vi, namely the B points with the highest ^-coordinates in the x-range of Vi (if 
existing) that have not been stored in ancestors of v. We store the 0{B^) points 
in the linear space static structure discussed above (the “0(i3^)-structure”) such 
that a 3-sided query on them can be answered in 0(logg B'^+T/B) = 0{1+T /B) 
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I/Os. Since every point is stored in precisely one -structure, the structure 

uses 0{N/B) space in total. 

To answer a 3-sided query ( 51 ,( 72 , 93 ) we start at the root of the external 
priority search tree and proceed recursively to the appropriate subtrees; when 
visiting a node v we query the 0(i?^)-structure and report the relevant points, 
and then we advance the search to some of the children of v. The search is 
advanced to child Vi if Vi is either along the leftmost search path for 51 or the 
rightmost search path for 52 , or if the entire set of points corresponding to Vi 
in the 0(i?^)-structure were reported — refer to Figure 5. The query procedure 
reports all points in the query range since if we do not visit child Vi corresponding 
to a slab completely spanned by the interval [ 51 , 52 ], it means that at least one 
of the points in the 0(i3^)-structure corresponding to Vi does not satisfy the 
query. This in turn means that none of the points in the subtree rooted at Vi 
can satisfy the query. That we use Oilogg N + T / B) I/Os to answer a query can 
be seen as follows. In every internal node v visited by the query procedure we 
spend 0(1 + Ty/B) I/Os, where T„ is the number of points reported. There are 
0{logg N) nodes visited on the search paths in the tree to the leaf containing 
5 i and the leaf containing 52 and thus the number of I/Os used in these nodes 
adds up to 0(log^ N + T/B). Each remaining visited internal node v is not on 
the search path but it is visited because 0{B) points corresponding to it were 
reported when we visited its parent. Thus the cost of visiting these nodes adds up 
to 0{T/B), even if we spend a constant number of I/Os in some nodes without 
finding 0{B) points to report. 

To insert a point p = (x, y) in the external priority search tree we search 
down the tree for the leaf containing x, until we reach the node v where p needs 
to be inserted in the 0(i?^)-structure. The 0(B^)-structure is static but since 
it has size 0{B^) we can use a global rebuilding idea to make it dynamic [125]; 
we simply store the update in a special “update block” and once B updates have 
been collected we rebuild the structure using 0(^ log^/g I/Os. Assuming 
M > B^, that is, that the internal memory is capable of holding B blocks, 
this is 0{B) and we obtain an 0(1) amortized update bound. Arge et al. [26] 
showed how to make this worst-case, even without the assumption on the main 




Fig. 5. Internal node v with children vi, V2, ... , V5. The points in bold are stored in 
the 0(i?^)-structure. To answer a 3-sided query we report the relevant of the O(B^) 
points and answer the query recursively in V2, V3, and 115. The query is not extended 
to V 4 because not all of the points from V 4 in the 0(73^)-structure satisfy the query. 
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memory size. Insertion of p in (may) result in the 0(i3^)-structure containing 
one too many points from the slab corresponding to the child Vj containing x. 
Therefore, apart from inserting p in the 0(B^)-structure, we also remove the 
point p' with the lowest y-coordinate among the points corresponding to vj. We 
insert p' recursively in the tree rooted in Vj. Since we use 0(1) I/Os in each of 
the nodes on the search path, the insertion takes 0(log^ N) I/Os. We also need 
to insert x in the base B-tree. This may result in split and/or share operations 
and each such operation may require rebuilding an 0(i?^)-structure (as well 
as movement of some points between O(B^)-structures). Using weight-balanced 
B-tress, Arge et al. [26] showed how the rebalancing after an insertion can be 
performed in 0{logg N) I/Os worst case. Deletions can be handled in 0{logg N) 
I/Os in a similar way [26]. 

The above solution to the 3-sided planar range searching problem illustrates 
some of the problems encountered when developing I/O-efficient dynamic data 
structures, as well as the techniques commonly used to overcome these problems. 
As already discussed, the main problem is that in order to be efficient, external 
tree data structures need to have large fan-out. In the above example this resulted 
in the need for what we called the 0(i?^)-structure. This structure solved a static 
version of the problem on O(B^) points. The structure was necessary since to 
“pay” for a visit to a child node Vi, we needed to find 0{B) points in the slab 
corresponding to Vi satisfying the query. The idea of charging some of the query 
cost to the output size is often called filtering [51], and the idea of using a static 
structure on O(B^) elements in each node has been called the bootstrapping 
paradigm [151,152]. Finally, the ideas of weight-balancing and global rebuilding 
were used to obtain worst-case efficient update bounds. All these ideas have been 
used in the development of other efficient external data structures. 

5 General Planar Range Searching 

After discussing 3-sided planar range searching we are now ready to consider 
general planar range searching; given a set of points in the plane we want to 
be able to find all points contained in a query rectangle. While linear space 
and 0{logg N + T / B) query structures exist for special cases of this problem — 
like the 3-sided problem described in Section 4 — Subramanian and Ramaswamy 
showed that one cannot obtain an 0(log3 N+T/B) query bound using less than 
blocks [141].^ This lower bound holds in a natural external 
memory version of the pointer machine model [53]. A similar bound in a slightly 
different model where the search component of the query is ignored was proved 
by Arge et al. [26]. This indexability model was defined by Hellerstein et al. [93] 
and considered by several authors [101,106,135]. 

Based on a sub-optimal but linear space structure for answering 3-sided 
queries, Subramanian and Ramaswamy developed the P-range tree that uses 
optimal 0( ) space but uses more than the optimal 0(logg N + T/B) 

^ In fact, this bound even holds for a query bound of 0(logg N+T/B) for any constant 

c. 




12 



L. Arge 



I/Os to answer a query [141]. Using their optimal structure for 3-sided queries, 
Arge et al. obtained an optimal structure [26]. We discuss the structure in Sec- 
tion 5.1 below. In practical applications involving massive datasets it is often 
crucial that external data structures use linear space. We discuss this further 
in Section 7. Grossi and Italiano developed the elegant linear space cross-tree 
data structure which answers planar range queries in 0 {y/WjB-\-T/ B) I/Os [89, 
90]. This is optimal for linear space data structures — as e.g. proven by Kanth 
and Singh [102]. The 0 -tree of Kanth and Singh [102] obtains the same bounds 
using ideas similar to the ones used by van Kreveld and Overmars in divided k-d 
trees [146]. In Section 5.2 below we discuss the cross-tree further. 



5.1 Logarithmic Query Structure 

The 0 {logg N T/B) query data structure is based on ideas from the corre- 
sponding internal memory data structure due to Chazelle [51]. The structure 
consists of a fan-out log^ N base tree over the ^-coordinates of the N points. 
As previously an x-range is associated with each node v and it is subdivided 
into logg N slabs by u’s children vi,V2, ■ • ■ ,uiogg at. We store all the points in 
the x-range of v in four secondary data structures associated with v. The first 
structure store the points in a linear list sorted by y-coordinate. The three other 
structures are external priority search trees. Two of these structures are used 
for answering 3-sided queries with the opening to the left and to the right, 
respectively. For the third priority search tree, we consider for each child Vi 
the points in the x-range of Vi in y-order, and for each pair of consecutive 
points (xi,j/i) and (x2,i/2) we store the point (j/1,2/2) in the tree. With each 
constructed point we also store pointers to the corresponding two original point 
in the sorted list of points in a child node. Since we use linear space on each of the 
0 (logi^g^ j^(N/B)) = 0 (log(N/B) / loglogg N) levels of the tree, the structure 

uses disk blocks in total. 

To answer a 4-sided query q = (91, 92, <73 > 94) we first find the topmost node v 
in the base tree where the x-range [<7i,<72] of the query contains a boundary 
between two slabs. Consider the case where <71 lies in the x-range of Vi and Q2 
lies in the x-range of Vj — refer to Figure 6. The query q is naturally decomposed 
into three parts, consisting of a part in a part in Vj, and a part completely 
spanning nodes Vk, for i < k < j. The points contained in the first two parts can 
be found in 0(log5 N-\-T/B) I/Os using the right opening priority search tree in 
Vi and the left opening priority search tree in Vj. To find the points in the third 
part we query the third priority search tree associated with v with (—00, <72, <72), 
that is, we find all points (j/1,772) in the structure with yi < <72 and 772 > <72. 
Since a point (j/1,772) corresponds to a consecutive pair (xi,j/i) and (x2,7/2) of 
the original points in a slab, we in this way obtain the 0(log^ N) bottommost 
point contained in the query for each of the nodes Ui+2, • ■ • Using 

the pointers to the same points in these children nodes, we then traverse the 
j — i — 1 = 0 {logg N) relevant sorted lists and output the remaining points 
using 0(logB N + T/B) I/Os. 
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Fig. 6. The slabs corresponding 
to a node v in the base tree. To 
answer a query (gi, (?2, ?a, <74) we 
need to answer 3-sided queries on 
the points in slab Vi and slab Vj, 
and a range query on the points 
in the 0(logg N) slabs between Vi 
and Vj. 




Fig. 7. Basic squares. To answer 
a query (gi, <72, <73, <74) we check 
points in two vertical and two hor- 
izontal slabs, and report points in 
basic squares completely covered 
by the query. 



To insert or delete a point, we need to perform 0(1) updates on each 
of the 0{log{N/ B) / loglogg N) levels of the base tree. Each of these up- 
dates takes 0{log^ N) I/Os. We also need to update the base tree. Us- 
ing a weight-balanced B-tree, Arge et al. showed how this can be done in 
0((logsfV)(logf)/loglogBfV) I/Os [26]. 

5.2 Linear Space Structure 

The linear space cross-tree structure of Grossi and Italiano consists of two lev- 
els [89,90]. The lower level partitions the plane into 0{^yN/B) vertical slabs 
and 0{^/nJB) horizontal slabs containing 0{y/NB) points each, forming an 
irregular grid of 0{N / B) basic squares — refer to Figure 7. Each basic square 
can contain between 0 and y/N/B points. The points are grouped and stored 
according to the vertical slabs — points in vertically adjacent basic squares con- 
taining less than B points are grouped together to form groups of 0{B) points 
and stored in blocks together. The points in a basic square containing more than 
B points are stored in a B-tree. Thus the lower level uses 0{N/B) space. The 
upper level consists of a linear space search structure which can be used to de- 
termine the basic square containing a given point — for now we can think of the 
structure as consisting of a fan-out V~B B-tree 7y on the ^/N/B vertical slabs 
and a separate fan-out y/B B-tree 7h on the y^N/B horizontal slabs. 

In order to answer a query ( 91 , 92 , < 73 > < 74 ) we use the upper level search tree to 
find the vertical slabs containing <71 and <73 and the horizontal slabs containing 
<72 and <74 using O(log^A^) = 0{log^ N) I/Os. We then explicitly check all 
points in these slabs and report all the relevant points. In doing so we use 
0{\/N B / B) = 0{yj N / B) I/Os to traverse the vertical slabs and 0{\/ NB/B + 
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y^N/B) = 0{^y N/ B) I/Os to traverse the horizontal slabs (the i/iV/S-term in 
the latter bound is a result of the slabs being blocked vertically — a horizontal slab 
contains y^N/B basic squares). Finally, we report all points corresponding to 
basic squares fully covered by the query. To do so we use 0{y^N/B + T/B) I/Os 
since the slabs are blocked vertically. In total we answer a query in 0{^jN/B + 
T/B) I/Os. 

In order to perform an update we need to find and update the relevant basic 
square. We may also need to split slabs (insertion) or merge slabs with neighbor 
slabs (deletions). In order to do so efficiently while still being able to answer a 
range query I/O-efficiently, the upper level is actually implemented using a cross- 
tree Thv- Thv can be viewed as a cross product of Tv and Th- For each pair of 
nodes u € Th and v € Tv on the same level we have a node (u, v) in Thv: and for 
each pair of edges {u,u') G Th and {v,v') G 7y we have an edge {{u,v), {u',v')) 
in Thv- Thus the tree has fan-out 0{B) and uses / BY) = 0{N/B) 

space. Grossi and Italiano showed how we can use the cross-tree to search for 
a basic square in O(log^iV) I/Os and how the full structure can be used to 
answer a range query in 0{^jN/B T/B) I/Os [89,90]. They also showed that 
if Th and Tv are implemented using weight-balanced B-trees, the structure can 
be maintained in 0(log^ N) I/Os during an update. 

6 Survey of Other External Data Structures 

After having discussed planar range searching is some detail, in this section we 
survey other results on worst-case efficient external data structures. 

Interval management. The interval management (or stabbing query) problem 
is the problem of maintaining a dynamically changing set of (one-dimensional) 
intervals such that given a query point q all intervals containing q can be reported 
efficiently. By mapping each interval [x, y] to the point {x, y) in the plane, a 
query corresponds to finding all points such that x < q and y > q, which means 
that the external priority search tree describe in Section 4.2 can be used to 
solve this problem optimally. However, the problem was first solved optimally 
by Arge and Vitter [30], who developed an external version of the interval tree 
of Edelsbrunner [71,72]. 

The external interval tree was the first to use several of the ideas also utilized 
in the external priority search tree; filtering, bootstrapping, and weight-balanced 
B-trees. The structure also utilized the notion of multislabs, which is useful 
when storing objects (like intervals) with a spatial extent. Recall that a slab 
is a subranges of the range associated with a node u of a base tree defined by 
the range of one of v’s children. A multislabs is simply a contiguous sets of 
slabs. A key idea in the external interval tree is to decrease the fanout of the 
base tree to Vb, maintaining the 0(log3 N) tree height, such that each node 
has 0{^/{B)Y = 0{B) associated multislabs. This way a constant amount of 
information can be stored about each multislab in 0(1) blocks. Similar ideas 
have been utilized in several other external data structures [14,25,3,28]. Variants 
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of the external interval tree structure, as well as applications of it in isosurface 
extraction, have been considered by Chiang and Silva [29,60,62,61] (see also [7]). 

Planar point location. The planar point location problem is defined as follows: 
Given a planar subdivision with N vertices (i.e., a decomposition of the plane 
into polygonal regions induced by a straight-line planar graph) , construct a data 
structure so that the face containing a query point p can be reported efficiently. 
In internal memory, a lot of work has been done on this problem — see e.g. the 
survey by Snoeyink [140]. Goodrich et al. [86] described the first query optimal 
0{logg N) I/O static solution to the problem, and several structures which can 
answer a batch of queries I/O-efficiently have also been developed [86,29,25,65, 
143]. 

Recently, progress has been made in the development of I/O-efficient dynamic 
point location structures. In the dynamic version of the problem one can change 
the subdivision dynamically (insert and delete edges and vertices). Based on 
the external interval tree structure and ideas also utilized in several internal 
memory structures [56,34], Agarwal et al. [3] developed a dynamic structure 
for monotone subdivisions. Utilizing the logarithmic method and a technique 
similar to dynamic fractional cascading [54,114], Arge and Vahrenhold improved 
the structure to work for general subdivisions. Their structure uses linear space 
and supports updates and queries in 0(logg N) I/Os. 

Range counting. Given a set of N points in the plane, a range counting 
query asks for the number of points within a query rectangle. Based on ideas 
utilized in an internal memory counting structure due to Ghazelle [52], Agarwal 
et al. [6] designed an external data structure for the range counting problem. 
Their structure use linear space and answers a query in 0(log3 N) I/Os. Based 
on a reduction due to Edelsbrunner and Overmars [74], they also designed a 
linear space and 0{\ogg N) query structure for the rectangle counting problem. 
In this problem, given a set of N rectangles in the plane, a query asks for the 
number of rectangles intersecting a query rectangle. Finally, they extended their 
structures to the d-dimensional versions of the two problems. See also [157] and 
references therein. 

Higher-dimensional range searching. Vengroff and Vitter [149] presented a 
data structure for 3-dimensional range searching with a logarithmic query bound. 
With recent modifications their structure answers queries in 0(log^ iV -|- T /B) 
I/Os and uses 0(^log^ ^/loglog^ A^) space [151]. More generally, they pre- 
sented structures for answering (3-|-fc)-sided queries (fc of the dimensions, 0 < k < 
3, have finite ranges) in ©(log^ N + T / B) I/Os using 0(^ log^ ^/loglog^ N) 
space. 

As mentioned, space use is often as crucial as query time when manipulat- 
ing massive datasets. The linear space cross-tree of Grossi and Italiano [89,90], 
as well as the 0-tree of Kanth and Singh [102], can be extended to support 
d-dimensional range queries in 0{{N/B)^~^/‘^ + T / B) I/Os. Updates can be 
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performed in O(log^iV) I/Os. The cross-tree can also be used in the design of 
dynamic data structures for several other problems [89,90]. 

Halfspace range searching. Given a set of points in d-dimensional space, a 
halfspace range query asks for all points on one side of a query hyperplane. Halfs- 
pace range searching is the simplest form of non-isothetic (non-orthogonal) range 
searching. The problem was first considered in external memory by Franciosa and 
Talamo [83,82] . Based on an internal memory structure due to Chazelle et al. [55], 
Agarwal et al. [5] described an optimal 0(logg N + T/B) query and linear space 
structure for the 2-dimensional case. Using ideas from an internal memory result 
of Chan [50], they described a structure for the 3-dimensional case, answering 
queries in 0(log^fV -|- T / B) expected I/Os but requiring 0{{N / B)\og{N / B)) 
space. Based on the internal memory partition trees of Matousek [112], they also 
gave a linear space data structure for answering d-dimensional halfspace range 
queries in -I- T/B) I/Os for any constant e > 0. The struc- 

ture supports updates in 0{{\og{N / B))\ogg N) expected I/Os amortized. Using 
an improved construction algorithm, Agarwal et al. [4] obtained an 0{log% N) 
amortized and expected update I/O-bound for the planar case. Agarwal et al. [5] 
also showed how the query bound of the structure can be improved at the expense 
of extra space. Finally, their linear space structure can also be used to answer 
very general queries — more precisely, all points within a query polyhedron with 
m faces can be found in 0{m{N/ + T/B) I/Os. 

Range searching on moving points. Recently there has been an increasing 
interest in external memory data structures for storing continuously moving 
objects. A key goal is to develop structures that only need to be changed when 
the velocity or direction of an object changes (as opposed to continuously). 

Kollios at al. [105] presented initial work on storing moving points in the 
plane such that all points inside a query range at query time t can be reported 
in a provably efficient number of I/Os. Their results were improved and extended 
by Agarwal et al. [4] who developed a linear space structure that answers a query 
in 0{{N / + T/B) I/Os for any constant e > 0. A point can be updated 
using Oilo^gN) I/Os. The structure is based on partition trees and can also 
be used to answer queries where two time values t\ and ^2 given and we 
want to find all points that lie in the query range at any time between t\ and 
O. Using the notion of kinetic data structures introduced by Basch et al. [33], as 
well as a persistent version of the planar range searching structure [26] discussed 
in Section 5, Agarwal et al. [4] also developed a number of other structures with 
improved query performance. One of these structures has the property that 
queries in the near future are answered faster than queries further away in time. 
Further structures with this property were developed by Agarwal et al. [10]. 

Proximity queries. Proximity queries such as nearest neighbor and closest 
pair queries have become increasingly important in recent years, for example 
because of their applications in similarity search and data mining. Callahan 
et al. [47] developed the first worst-case efficient external proximity query data 
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structures. Their structures are based on an external version of the topology trees 
of Frederickson [84] called topology B-trees, which can be used to dynamically 
maintain arbitrary binary trees I/O-efficiently. 

Using topology B-trees and ideas from an internal structure of Bespamyat- 
nikh [42], Callahan et al. [47] designed a linear space data structure for dynam- 
ically maintaining the closest pair of a set of points in d-dimensional space. The 
structure supports updates in 0{logg N) I/Os. The same result was obtained by 
Govindarajan et al. [87] using the well-separated pair decomposition of Callahan 
and Kosaraju [48,49]. Govindarajan et al. [87] also showed how to dynamically 
maintain a well-separated pair decomposition of a set of d-dimensional points 
using 0(logg N) I/Os per update. 

Using topology B-trees and ideas from an internal structure due to Arya 
et al. [31], Callahan et al. [47] developed a linear space data structure for the 
dynamic approximate nearest neighbor problem. Given a set of points in d- 
dimensional space, a query point p, and a parameter e, the approximate nearest 
neighbor problem consists of finding a point q with distance at most (1 + e) times 
the distance of the actual nearest neighbor of p. The structure answers queries 
and supports updates in 0(log3 N) I/Os. Agarwal et al. [4] designed I/O-efRcient 
data structures for answering approximate nearest neighbor queries on a set of 
moving points. 

In some applications we are interested in finding not only the nearest but all 
the k nearest neighbors of a query point. Based on their 3-dimensional halfspa- 
ce range searching structure, Agarwal et al [5] described a structure that uses 
0{{N/B) log{N/B)) space to store N points in the plane such that a k nearest 
neighbors query can be answered in (log^ N k/B) I/Os. 

7 Practical General-Purpose Structures 

Although several of the worst-case efficient (and often optimal) data structures 
discussed in the previous sections are simple enough to be of practical interest, 
they are often not the obvious choices when deciding which data structures to 
use in a real-world application. There are several reasons for this, one of the 
most important being that in real applications involving massive datasets it is 
practically feasible to use data structures of size cN/B only for a very small 
constant c. Since fundamental lower bounds often prevent logarithmic worst- 
case search cost for even relatively simple problems when restricting the space 
use to linear, we need to develop heuristic structures which perform well in most 
practical cases. Space restrictions also motivate us not to use structures for single 
specialized queries but instead design general structures that can be used to 
answer several different types of queries. Finally, implementation considerations 
often motivate us to sacrifice worst-case efficiency for simplicity. All of these 
considerations have led to the development of a large number of general-purpose 
data structures that often work well in practice, but which do not come with 
worst-case performance guarantees. Below we quickly survey the major classes 
of such structures. The reader is referred to more complete surveys for details 
[12,85,121,88,124,134]. 
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Range searching in d-dimensions is the most extensively researched problem. 
A large number of structures have been developed for this problem, including 
space filling curves (see e.g. [123,1,32]), grid-files [119,94], various quad-trees [133, 
134], fcd-B tress [128] — and variants like Buddy-trees [138], hB-trees [109,75] and 
cell-trees [91] — and various R-trees [92,88,139,37,100]. Often these structures are 
broadly classified into two types, namely space driven structures (like quad-trees 
and grid- files), which partition the embedded space containing the data points 
and data driven structures (like /cd-B trees and R-trees), which partition the data 
points themselves. Agarwal et al. [9] describe a general framework for efficient 
construction and updating of many of the above structures. 

As mentioned above, we often want to be able to answer a very diverse set 
of queries, like halfspace range queries, general polygon range queries, and point 
location queries, on a single data structure. Many of the above data structures 
can easily be used to answer many such different queries and that is one main 
reason for their practical success. Recently, there has also been a lot of work on 
extensions — or even new structures — which also support e.g. moving objects (see 
e.g [155,132,154,126] and references therein) or proximity queries (see e.g. [41, 
129,103,85,12,121] and references therein). However, as discussed, most often no 
guarantee on the worst-case query performance is provided for these structures. 

So far we have mostly discussed point data structures. In general, we are 
interested in storing objects such as lines and polyhedra with a spatial extent. 
Like in the point case, a large number of heuristic structures, many of which 
are variations of the ones mentioned above, have been proposed for such ob- 
jects. However, almost no worst-case efficient structures are known. In practice 
a filtering /refinement method is often used when managing objects with spatial 
extent. Instead of directly storing the objects in the data structure we store 
the minimal bounding (axis-parallel) rectangle containing each object together 
with a pointer to the object itself. When answering a query we first find all the 
minimal bounding rectangles fulfilling the query (the filtering step) and then we 
retrieve the objects corresponding to these rectangles and check each of them 
to see if they fulfill the query (the refinement step). One way of designing data 
structures for rectangles (or even more general objects) is to transform them into 
points in higher-dimensional space and store these points in one of the point data 
structures discussed above (see e.g. [85,121] for a survey). However, a structure 
based on another idea has emerged as especially efficient for storing and query- 
ing minimal bounding rectangles. Below we further discuss this so-called R-tree 
and its many variants. 



R-trees. The R-tree, originally proposed by Guttman [92], is a multiway tree 
very similar to a B-tree; all leaf nodes are on the same level of the tree and 
a leaf contains 0{B) data rectangles. Each internal node v (except maybe for 
the root) has 0{B) children. For each of its children Vi, v contains the minimal 
bounding rectangle of all the rectangles in the tree rooted in Uj. An R-tree has 
height 0(log^ N) and uses 0{N/B) space. An example of an R-tree is shown in 
Figure 8. Note that there is no unique R-tree for a given set of data rectangles 
and that minimal bounding rectangles stored within an R-tree node can overlap. 
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Fig. 8. R-tree constructed on rectangles A, B, C, . . . , I (B = 3). 



In order to query an R-tree to find, say, all rectangles containing a query 
point p, we start at the root and recursively visit all children whose minimal 
bounding rectangle contains p. This way we visit all internal nodes whose min- 
imal bounding rectangle contains p. There can be many more such nodes than 
actual data rectangles containing p and intuitively we want the minimal bound- 
ing rectangles stored in an internal node to overlap as little as possible in order 
to obtain a query efficient structure. 

An insertion can be performed in 0{logg N) I/Os like in a B-tree. We first 
traverse the path from the root to the leaf we choose to insert the new rectangle 
into. The insertion might result in the need for node splittings on the same root- 
leaf path. As insertion of a new rectangle can increase the overlap in a node, 
several heuristics for choosing which leaf to insert a new rectangle into, as well as 
for splitting nodes during rebalancing, have been proposed [88,139,37,100]. The 
R*-tree variant of Beckmann et al. [37] seems to result in the best performance in 
many cases. Deletions are also performed similarly to deletions in a B-tree but we 
cannot guarantee an 0(log5 N) bound since finding the data rectangle to delete 
may require many more I/Os. Rebalancing after a deletion can be performed 
by fusing nodes like in a B-tree but some R-tree variants instead delete a node 
when it underflows and reinsert its children into the tree (often referred to as 
“forced reinsertion”). The idea is to try to obtain a better structure by forcing a 
global reorganization of the structure instead of the local reorganization a node 
fuse constitutes. 

Constructing an R-tree using repeated insertion takes 0{N logg N) I/Os 
and does not necessarily result in a good tree in terms of query performance. 
Therefore several sorting based log^^/s "f) I/O construction algorithms 
have been proposed [130,99,69,108,40]. Several of these algorithms produce an R- 
tree with practically better query performance than an R-tree built by repeated 
insertion. Still, no better than a linear worst-case query I/O-bound has been 
proven for any of them. Very recently, however, de Berg et al. [68] and Agarwal 
et al. [11] presented R-tree construction algorithms resulting in R-trees with 
provably efficient worst-case query performance measured in terms of certain 
parameters describing the input data. They also discussed how these structures 
can be efficiently maintained dynamically. 
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8 Implementation of I/O-EfRcient Data Structures 

Two ongoing projects aim at developing software packages that facilitates im- 
plementation of I/O-efficient algorithms and data structures in a high-level, 
portable and efficient way. These projects are the LEDA-SM project at MPI 
in Germany [66,67] and the TPIE (Transparent Parallel I/O Programming En- 
vironment) project at Duke [17,148]. We briefly discuss these projects and the ex- 
periments performed within them below. Outside these projects, a few other au- 
thors have reported on stand-alone implementations of geometric algorithms [57, 
58], external interval trees [60,61,62], buffer trees [96], and string B-trees [79]. 

LEDA-SM. LEDA-SM is an extension of the LEDA library [115] of efficient 
algorithms and data structures. It consists of a kernel that gives an abstract 
view of external memory as a collection of disks, each consisting of a collection 
of blocks. The kernel provides a number of primitives for manipulating blocks, 
which facilitate efficient implementation of external memory algorithms and data 
structures. The LEDA-SM distribution also contains a collection of fundamental 
data structures, such as stacks, queues, heaps, B-trees and buffer-trees, as well as 
a few fundamental algorithms such as external sorting and matrix operations. It 
also contains algorithms and data structures for manipulating strings and mas- 
sive graphs. Results on the practical performance of LEDA-SM implementations 
of external priority queues and I/O-efficient construction of suffix arrays can be 
found in [44] and [64], respectively. 

TPIE. The first part of the TPIE project took a stream-based approach to 
computation [148,17], where the kernel feeds a continuous stream of elements to 
the user programs in an I/O-efficient manner. This approach is justified by the- 
oretical research on I/O-efficient algorithms, which show that a large number of 
problems can be solved using a small number of streaming paradigms, all imple- 
mented in TPIE. This part of TPIE also contains fundamental data structures 
such as queues and stacks, algorithms for sorting and matrix operations, as well 
as a few more specialized geometric algorithms. It has been used in I/O-efficient 
implementations of several scientific computation [150], spatial join [24,25,23], 
and terrain flow [27,19] algorithms. 

Since most external data structures cannot efficiently be implemented in 
a stream-based framework, the second part of the TPIE project adds kernel 
support for a block oriented programming style. Like in LEDA-SM, the external 
memory is viewed as a collection of blocks and primitives for manipulating such 
blocks are provided. Fundamental data structure such as B-trees and R-trees 
are also provided with this part of TPIE. The block oriented part of TPIE is 
integration with the stream oriented part, and together the two parts have been 
used to implement I/O-efficient algorithms for R-trees construction [21] based on 
the buffer technique, to implement an I/O-efficient algorithms for constriction 
and updating of kd-tree [8,9], as well as to implement the recently developed 
structures for range counting [6] . Other external data structures currently being 
implemented includes persistent B-trees, structures for planar point location, 
and the external priority search tree. 
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9 Conclusions 

In this paper we have discussed recent advances in the development of prov- 
ably efficient external memory dynamic data structures, mainly for geometric 
objects and especially for the one- and two-dimensional range searching prob- 
lems. A more detailed survey by the author can be found in [16]. Even though a 
lot of progress has been made, many problems still remain open. For example, 
0{logg A^)-query and space efficient structures still need to be found for many 
higher-dimensional problems. 

Acknowledgments. The author thanks Tammy Bailey, Tavi Procopiuc, and 
Jan Vahrenhold for comments on earlier drafts of this paper. 

References 

1. D. J. Abel and D. M. Mark. A comparative analysis of some two-dimensional 
orderings. Inti. J. Geographic Informations Systems, 4(1):21-31, 1990. 

2. J. Abello, A. L. Buchsbaum, and J. R. Westbrook. A functional approach to 
external graph algorithms. In Proc. Annual European Symposium on Algorithms, 
LNCS 1461, pages 332-343, 1998. 

3. P. K. Agarwal, L. Arge, G. S. Brodal, and J. S. Vitter. I/O-efficient dynamic 
point location in monotone planar subdivisions. In Proc. ACM-SIAM Symp. on 
Discrete Algorithms, pages 1116-1127, 1999. 

4. P. K. Agarwal, L. Arge, and J. Erickson. Indexing moving points. In Proe. ACM 
Symp. Prineiples of Database Systems, pages 175-186, 2000. 

5. P. K. Agarwal, L. Arge, J. Erickson, P. Eranciosa, and J. Vitter. Efficient searching 
with linear constraints. Journal of Computer and System Sciences, 61(2):194-216, 
2000 . 

6. P. K. Agarwal, L. Arge, and S. Govindarajan. GRB-tree: An optimal indexing 
scheme for 2d aggregate queries. Manuscript, 2001. 

7. P. K. Agarwal, L. Arge, T. M. Murali, K. Varadarajan, and J. S. Vitter. I/O- 
eflicient algorithms for contour line extraction and planar graph blocking. In 
Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 117-126, 1998. 

8. P. K. Agarwal, L. Arge, O. Procopiuc, and J. S. Vitter. Dynamic fcd-trees on 
large data sets. Manuscript, 2001. 

9. P. K. Agarwal, L. Arge, O. Procopiuc, and J. S. Vitter. A framework for index 
bulk loading and dynamization. In Proc. Annual International Colloquium on 
Automata, Languages, and Programming, 2001. 

10. P. K. Agarwal, L. Arge, and J. Vahrenhold. A time responsive indexing scheme 
for moving points. In Proc. Workshop on Algorithms and Data Structures, 2001. 

11. P. K. Agarwal, M. de Berg, J. Gudmundsson, M. Hammer, and H. J. Haverkort. 
Box-trees and R-trees with near-optimal query time. In Proc. ACM Symp. on 
Computational Geometry, pages 124-133, 2001. 

12. P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In 
B. Ghazelle, J. E. Goodman, and R. Pollack, editors. Advances in Discrete and 
Computational Geometry, volume 223 of Contemporary Mathematics, pages 1-56. 
American Mathematical Society, Providence, RI, 1999. 

13. A. Aggarwal and J. S. Vitter. The Input/Output complexity of sorting and related 
problems. Communications of the ACM, 31(9):1116-1127, 1988. 




22 



L. Arge 



14. L. Arge. The buffer tree: A new technique for optimal I/O-algorithms. In Proc. 
Workshop on Algorithms and Data Structures, LNCS 955, pages 334-345, 1995. 
A complete version appears as BRIGS technical report RS-96-28, University of 
Aarhus. 

15. L. Arge. The 1/ 0-complexity of ordered binary-decision diagram manipulation. 
In Proc. Int. Symp. on Algorithms and Computation, LNCS 1004, pages 82-91, 
1995. A complete version appears as BRIGS technical report RS-96-29, University 
of Aarhus. 

16. L. Arge. External memory data structures. In J. Abello, P. M. Pardalos, and 
M. G. G. Resende, editors. Handbook of Massive Data Sets. Kluwer Academic 
Publishers, 2001. (To appear). 

17. L. Arge, R. Barve, O. Procopiuc, L. Toma, D. E. Vengroff, and R. Wick- 
remesinghe. TPIE User Manual and Reference (edition 0.9.01a). Duke Uni- 
versity, 1999. The manual and software distribution are available on the web at 
http : //www. cs . duke . edu/TPIE/. 

18. L. Arge, G. S. Brodal, and L. Toma. On external memory MST, SSSP and multi- 
way planar graph separation. In Proe. Scandinavian Workshop on Algorithms 
Theory, LNCS 1851, pages 433-447, 2000. 

19. L. Arge, J. Ghase, P. Halpin, L. Toma, D. Urban, J. Vitter, and R. Wick- 
remesinghe. Flow computation on massive grids. In 16’th Annual Symposium 
of the International Association of Landscape Ecology (US-IALE 2001), 2001. 

20. L. Arge, P. Ferragina, R. Grossi, and J. Vitter. On sorting strings in external 
memory. In Proc. ACM Symp. on Theory of Computation, pages 540-548, 1997. 

21. L. Arge, K. H. Hinrichs, J. Vahrenhold, and J. S. Vitter. Efficient bulk operations 
on dynamic R-trees. In Proc. Workshop on Algorithm Engineering, LNCS 1619, 
pages 328-347, 1999. 

22. L. Arge, U. Meyer, L. Toma, and N. Zeh. On external-memory planar depth first 
search. In Proc. Workshop on Algorithms and Data Structures, 2001. 

23. L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, J. Vahrenhold, and J. S. Vitter. A 
unified approach for indexed and non-indexed spatial joins. In Proc. Conference 
on Extending Database Technology, 1999. 

24. L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scal- 
able sweeping-based spatial join. In Proc. International Conf. on Very Large 
Databases, 1998. 

25. L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Theory and prac- 
tice of I/O-efficient algorithms for multidimensional batched searching problems. 
In Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 685-694, 1998. 

26. L. Arge, V. Samoladas, and J. S. Vitter. On two-dimensional indexability and 
optimal range search indexing. In Proc. ACM Symp. Principles of Database Sys- 
tems, pages 346-357, 1999. 

27. L. Arge, L. Toma, and J. S. Vitter. I/O-efHcient algorithms for problems on 
grid-based terrains. In Proc. Workshop on Algorithm Engineering and Experi- 
mentation, 2000. 

28. L. Arge and J. Vahrenhold. I/O-efScient dynamic planar point location. In Proc. 
ACM Symp. on Computational Geometry, pages 191-200, 2000. 

29. L. Arge, D. E. Vengroff, and J. S. Vitter. External-memory algorithms for process- 
ing line segments in geographic information systems. In Proe. Annual European 
Symposium on Algorithms, LNCS 979, pages 295-310, 1995. To appear in special 
issues of Algorithmica on Geographical Information Systems. 




External Memory Data Structures 



23 



30. L. Arge and J. S. Vitter. Optimal dynamic interval management in external 
memory. In Proc. IEEE Symp. on Foundations of Comp. Set., pages 560-569, 
1996. 

31. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal 
algorithm for approximate nearest neighbor searching. In Proc. 5th ACM-SIAM 
Sympos. Discrete Algorithms, pages 573-582, 1994. 

32. T. Asano, D. Ranjan, T. Roos, E. Welzl, and P. Widmayer. Space-filling curves 
and their use in the design of geometric data structures. Theoret. Comput. Sci., 
181(1):3-15, July 1997. 

33. J. Basch, L. J. Guibas, and J. Hershberger. Data structures for mobile data. 
Journal of Algorithms, 31(l):l-28, 1999. 

34. H. Baumgarten, H. Jung, and K. Mehlhorn. Dynamic point location in general 
subdivisions. Journal of Algorithms, 17:342-380, 1994. 

35. R. Bayer and E. McCreight. Organization and maintenance of large ordered 
indexes. Acta Informatica, 1:173-189, 1972. 

36. B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically 
optimal multiversion B-tree. VLDB Journal, 5(4):264-275, 1996. 

37. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An effi- 
cient and robust access method for points and rectangles. In Proc. SIGMOD Inti. 
Conf. on Management of Data, pages 322-331, 1990. 

38. M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. 
In Proc. IEEE Symp. on Foundations of Comp. Sci., pages 339-409, 2000. 

39. J. L. Bentley. Decomposable searching problems. Information Processing Letters, 
8(5):244-251, 1979. 

40. S. Berchtold, C. Bohm, and H.-P. Kriegel. Improving the query performance of 
high- dimensional index structures by bulk load operations. In Proc. Conference 
on Extending Database Teehnology, LNCS 1311, pages 216-230, 1998. 

41. S. Berchtold, B. Ertl, D. A. Keim, H.-P. Kriegel, and T. Seidl. Fast nearest neigh- 
bor search in high-dimensional spaces. In Proc. IEEE International Conference 
on Data Engineering, pages 209-218, 1998. 

42. S. N. Bespamyatnikh. An optimal algorithm for closets pair maintenance. Discrete 
and Computational Geometry, 19:175-195, 1998. 

43. G. Blankenagel and R. H. Girting. XP-trees — External priority search trees. Tech- 
nical report, FernUniversitat Hagen, Informatik-Bericht Nr. 92, 1990. 

44. K. Brengel, A. Crauser, P. Ferragina, and U. Meyer. An experimental study of 
priority queues in external memory. In Proc. Workshop on Algorithm Engineering, 
LNCS 1668, pages 345-358, 1999. 

45. G. S. Brodal and J. Katajainen. Worst-case efficient external-memory priority 
queues. In Proc. Scandinavian Workshop on Algorithms Theory, LNCS 1432, 
pages 107-118, 1998. 

46. A. L. Buchsbaum, M. Goldwasser, S. Venkatasubramanian, and J. R. Westbrook. 
On external memory graph traversal. In Proc. ACM-SIAM Symp. on Discrete 
Algorithms, pages 859-860, 2000. 

47. P. Callahan, M. T. Goodrich, and K. Ramaiyer. Topology B-trees and their 
applications. In Proc. Workshop on Algorithms and Data Structures, LNCS 955, 
pages 381-392, 1995. 

48. P. B. Callahan and S. R. Kosaraju. Algorithms for dynamic closest-pair and 
n-body potential fields. In Proc. 6th ACM-SIAM Sympos. Discrete Algorithms, 
pages 263-272, 1995. 




24 



L. Arge 



49. P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point 
sets with applications to fc-nearest-neighbors and n-body potential helds. Journal 
of the ACM, 42(l):67-90, 1995. 

50. T. M. Chan. Random sampling, halfspace range reporting, and construction of 
(< fc)-levels in three dimensions. SIAM Journal of Computing, 30(2):561-575, 
2000 . 

51. B. Chazelle. Filtering search: a new approach to query-answering. SIAM J. 
Comput, 15(3):703-724, 1986. 

52. B. Chazelle. A functional approach to data structures and its use in multidimen- 
sional searching. SIAM J. Comput., 17(3):427-462, June 1988. 

53. B. Chazelle. Lower bounds for orthogonal range searching: I. the reporting case. 
Journal of the ACM, 37(2):200-212, Apr. 1990. 

54. B. Chazelle and L. J. Guibas. Fractional cascading: I. A data structuring tech- 
nique. Algorithmica, 1:133-162, 1986. 

55. B. Chazelle, L. J. Guibas, and D. T. Lee. The power of geometric duality. BIT, 
25(l):76-90, 1985. 

56. S. W. Cheng and R. Janardan. New results on dynamic planar point location. 
SIAM J. Comput., 21(5):972-999, 1992. 

57. Y.-J. Chiang. Dynamic and I/O-Efficient Algorithms for Computational Geom- 
etry and Graph Problems: Theoretical and Experimental Results. PhD thesis. 
Brown University, August 1995. 

58. Y.-J. Chiang. Experiments on the practical I/O efficiency of geometric algorithms: 
Distribution sweep vs. plane sweep. In Proc. Workshop on Algorithms and Data 
Structures, LNCS 955, pages 346-357, 1995. 

59. Y.-J. Chiang, M. T. Goodrich, E. F. Grove, R. Tamassia, D. E. Vengroff, and 
J. S. Vitter. External-memory graph algorithms. In Proc. ACM-SIAM Symp. on 
Discrete Algorithms, pages 139-149, 1995. 

60. Y.-J. Ghiang and G. T. Silva. I/O optimal isosurface extraction. In Proc. IEEE 
Visualization, pages 293-300, 1997. 

61. Y.-J. Chiang and C. T. Silva. External memory techniques for isosurface ex- 
traction in scientific visualization. In J. Abello and J. S. Vitter, editors, External 
memory algorithms and visualization, pages 247-277. American Mathematical So- 
ciety, DIMACS series in Discrete Mathematics and Theoretical Computer Science, 
1999. 

62. Y.-J. Chiang, C. T. Silva, and W. J. Schroeder. Interactive out-of-core isosurface 
extraction. In Proc. IEEE Visualization, pages 167-174, 1998. 

63. D. Comer. The ubiquitous B-tree. ACM Computing Surveys, 11(2):121-137, 1979. 

64. A. Crauser and P. Ferragina. On constructing suffix arrays in external memory. In 
Proc. Annual European Symposium on Algorithms, LNCS, 1643, pages 224-235, 
1999. 

65. A. Crauser, P. Ferragina, K. Mehlhorn, U. Meyer, and E. Ramos. Randomized 
external-memory algorithms for some geometric problems. In Proc. ACM Symp. 
on Computational Geometry, pages 259-268, 1998. 

66. A. Crauser and K. Mehlhorn. LEDA-SM: A Platform for Secondary Memory 
Computation. Max-Planck-Institut fiir Informatik, 1999. The manual and soft- 
ware distribution are available on the web at 

http : //www . mpi-sb . mpg . de/~crauser/leda-sm . html. 

67. A. Crauser and K. Mehlhorn. LEDA-SM: Extending LEDA to secondary memory. 
In Proc. Workshop on Algorithm Engineering, 1999. 




External Memory Data Structures 



25 



68. M. de Berg, J. Gudmundsson, M. Hammar, and M. Overmars. On R-trees with 
low stabbing number. In Proc. Annual European Symposium on Algorithms, pages 
167-178, 2000. 

69. D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.-B. Yu. Client-server paradise. 
In Proc. International Conf. on Very Large Databases, pages 558-569, 1994. 

70. J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. Tarjan. Making data structures 
persistent. Journal of Computer and System Sciences, 38:86-124, 1989. 

71. H. Edelsbrunner. A new approach to rectangle intersections, part I. Int. J. 
Computer Mathematics, 13:209-219, 1983. 

72. H. Edelsbrunner. A new approach to rectangle intersections, part II. Int. J. 
Computer Mathematics, 13:221-229, 1983. 

73. H. Edelsbrunner and M. Overmars. Batched dynamic solutions to decomposable 
searching problems. Journal of Algorithms, 6:515-542, 1985. 

74. H. Edelsbrunner and M. H. Overmars. On the equivalence of some rectangle 
problems. Inform. Process. Lett., 14:124-127, 1982. 

75. G. Evangelidis, D. Lomet, and B. Salzberg. The hb^-tiee: A multi-attribute index 
supporting concurrency, recovery and node consolidation. The VLDB Journal, 
6(l):l-25, 1997. 

76. R. Fadel, K. V. Jakobsen, J. Katajainen, and J. Teuhola. Heaps and heapsort on 
secondary storage. Theoretical Computer Science, 220(2) :345-362, 1999. 

77. M. Farach, P. Ferragina, and S. Muthukrishnan. Overcoming the memory bottle- 
neck in suffix tree construction. In Proc. IEEE Symp. on Eoundations of Comp. 
Sci., pages 174-183, 1998. 

78. P. Ferragina and R. Grossi. A fully-dynamic data structure for external substring 
search. In Proc. ACM Symp. on Theory of Computation, pages 693-702, 1995. 

79. P. Ferragina and R. Grossi. Fast string searching in secondary storage: Theoretical 
developments and experimental results. In Proc. ACM-SIAM Symp. on Discrete 
Algorithms, pages 373-382, 1996. 

80. P. Ferragina and F. Luccio. Dynamic dictionary matching in external memory. 
Information and Computation, 146(2) :85-99, 1998. 

81. E. Eeuerstein and A. Marchetti-Spaccamela. Memory paging for connectivity and 
path problems in graphs. In Proc. Int. Symp. on Algorithms and Computation, 
LNCS 762, pages 416-425, 1993. 

82. P. Franciosa and M. Talamo. Time optimal halfplane search on external memory. 
Unpublished manuscript, 1997. 

83. P. G. Franciosa and M. Talamo. Orders, fc-sets and fast halfplane search on 
paged memory. In Proc. Workshop on Orders, Algorithms and Applications 
(ORDAL’94), LNCS 831, pages 117-127, 1994. 

84. G. N. Frederickson. A structure for dynamically maintaining rooted trees. In 
Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 175-184, 1993. 

85. V. Gaede and O. Gunther. Multidimensional access methods. ACM Computing 
Surveys, 30(2):170-231, 1998. 

86. M. T. Goodrich, J.-J. Tsay, D. E. Vengroff, and J. S. Vitter. External-memory 
computational geometry. In Proc. IEEE Symp. on Foundations of Comp. Sci., 
pages 714-723, 1993. 

87. S. Govindarajan, T. Lukovszki, A. Maheshwari, and N. Zeh. I/O-efBcient well- 
separated pair decomposition and its applications. In Proc. Annual European 
Symposium on Algorithms, pages 220-231, 2000. 

88. D. Greene. An implementation and performance analysis of spatial data access 
methods. In Proc. IEEE International Conference on Data Engineering, pages 
606-615, 1989. 




26 



L. Arge 



89. R. Grossi and G. F. Italiano. Efficient cross-tree for external memory. In 
J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualiza- 
tion, pages 87-106. American Mathematical Society, DIMAGS series in Discrete 
Mathematics and Theoretical Computer Science, 1999. Revised version available 
at ftp:/ /ftp.di.nnipi.it/pub/techreports/TR-00-16.ps.Z. 

90. R. Grossi and G. F. Italiano. Efficient splitting and merging algorithms for order 
decomposable problems. Information and Computation, 154(l):l-33, 1999. 

91. O. Gunther. The design of the cell tree: An object-oriented index structure for ge- 
ometric databases. In Proc. IEEE International Conference on Data Engineering, 
pages 598-605, 1989. 

92. A. Guttman. R-trees: A dynamic index strncture for spatial searching. In Proc. 
SIGMOD Inti. Conf. on Management of Data, pages 47-57, 1984. 

93. J. M. Hellerstein, E. Koutsoupias, and C. H. Papadimitrion. On the analysis of 
indexing schemes. In Proc. ACM Symp. Principles of Database Systems, pages 
249-256, 1997. 

94. K. H. Hinrichs. The grid file system: Implementation and case studies of applica- 
tions. PhD thesis, Dept. Information Science, ETH, Zurich, 1985. 

95. S. Huddleston and K. Mehlhorn. A new data structure for representing sorted 
lists. Acta Informatica, 17:157-184, 1982. 

96. D. Hutchinson, A. Maheshwari, J.-R. Sack, and R. Velicescu. Early experiences 
in implementing the buffer tree. In Proc. Workshop on Algorithm Engineering, 
pages 92-103, 1997. 

97. D. Hutchinson, A. Maheshwari, and N. Zeh. An external- memory data struc- 
ture for shortest path queries. In Proc. Annual Combinatorics and Computing 
Conference, LNCS 1627, pages 51-60, 1999. 

98. G. Icking, R. Klein, and T. Ottmann. Priority search trees in secondary memory. 
In Proc. Graph- Theoretic Concepts in Computer Science, LNCS 3 If, pages 84-93, 
1987. 

99. I. Kamel and C. Faloutsos. On packing R-trees. In Proc. International Conference 
on Information and Knowledge Management, pages 490-499, 1993. 

100. I. Kamel and C. Faloutsos. Hilbert R-tree: An improved R-tree using fractals. In 
Proc. International Conf. on Very Large Databases, pages 500-509, 1994. 

101. P. G. Kanellakis, S. Ramaswamy, D. E. Vengroff, and J. S. Vitter. Indexing 
for data models with constraints and classes. Journal of Computer and System 
Sciences, 52(3):589-612, 1996. 

102. K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non- 
replicating index structures. In Proc. International Conference on Database The- 
ory, LNCS 1540, pages 257-276, 1999. 

103. N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional 
nearest-neighbor queries. In Proc. SIGMOD Inti. Conf. on Management of Data, 
pages 369-380, 1997. 

104. D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Program- 
ming. Addison- Wesley, Reading MA, second edition, 1998. 

105. G. Kollios, D. Gunopulos, and V. J. Tsotras. On indexing mobile objects. In 
Proc. ACM Symp. Principles of Database Systems, pages 261-272, 1999. 

106. E. Koutsoupias and D. S. Taylor. Tight bounds for 2-dimensional indexing 
schemes. In Proc. ACM Symp. Principles of Database Systems, pages 52-58, 
1998. 

107. V. Kumar and E. Schwabe. Improved algorithms and data structures for solv- 
ing graph problems in external memory. In Proc. IEEE Symp. on Parallel and 
Distributed Processing, pages 169-177, 1996. 




External Memory Data Structures 



27 



108. S. T. Leutenegger, M. A. Lopez, and J. Edgington. STR: A simple and efficient 
algorithm for R-tree packing. In Proc. IEEE International Conference on Data 
Engineering, pages 497-506, 1996. 

109. D. Lomet and B. Salzberg. The hB-tree: A multiattribute indexing method 
with good guaranteed performance. ACM Transactions on Database Systems, 
15(4):625-658, 1990. 

110. A. Maheshwari and N. Zeh. External memory algorithms for outerplanar graphs. 
In Proc. Int. Symp. on Algorithms and Computation, LNCS 1741, pages 307-316, 
1999. 

111. A. Maheshwari and N. Zeh. I/O-efficient algorithms for graphs of bounded 
treewidth. In Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 89-90, 
2001 . 

112. J. Matousek. Efficient partition trees. Discrete Comput. Geom., 8:315-334, 1992. 

113. E. McCreight. Priority search trees. SIAM Journal of Computing, 14(2):257-276, 
1985. 

114. K. Mehlhorn and S. Naher. Dynamic fractional cascading. Algorithmica, 5:215- 
241, 1990. 

115. K. Mehlhorn and S. Naher. LEDA: A Platform for Combinatorial and Geometric 
Computing. Cambridge University Press, Cambridge, UK, 2000. 

116. U. Meyer. External memory bfs on undirected graphs with bounded degree. In 
Proc. ACM-SIAM Symp. on Discrete Algorithms, pages 87-88, 2001. 

117. D. R. Morrison. PATRICIA: Practical algorithm to retrieve information coded in 
alphanumeric. Journal of the ACM, 15:514-534, 1968. 

118. K. Munagala and A. Ranade. 1/ 0-complexity of graph algorithm. In Proc. ACM- 
SIAM Symp. on Discrete Algorithms, pages 687-694, 1999. 

119. J. Nievergelt, H. Hinterberger, and K. Sevcik. The grid file: An adaptable, sym- 
metric multikey file structure. ACM Transactions on Database Systems, 9(1):38- 
71, 1984. 

120. J. Nievergelt and E. M. Reingold. Binary search tree of bounded balance. SIAM 
Journal of Computing, 2(l):33-43, 1973. 

121. J. Nievergelt and P. Widmayer. Spatial data structures: Concepts and design 
choices. In M. van Kreveld, J. Nievergelt, T. Roos, and P. Widmayer, editors. 
Algorithmic Foundations of CIS, pages 153-197. Springer- Verlag, LNCS 1340, 
1997. 

122. M. H. Nodine, M. T. Goodrich, and J. S. Vitter. Blocking for external graph 
searching. Algorithmica, 16(2):181-214, 1996. 

123. J. Orenstein. Spatial query processing in an object-oriented database system. In 
Proc. ACM SIGMOD Conf. on Management of Data, pages 326-336, 1986. 

124. J. Orenstein. A comparison of spatial query processing techniques for native and 
parameter spaces. In Proc. SIGMOD Inti. Conf. on Management of Data, pages 
343-352, 1990. 

125. M. H. Overmars. The Design of Dynamic Data Structures. Springer- Verlag, LNCS 
156, 1983. 

126. D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel approaches to the index- 
ing of moving objects trajectories. In Proc. International Conf. on Very Large 
Databases, pages 395-406, 2000. 

127. S. Ramaswamy and S. Subramanian. Path caching: A technique for optimal 
external searching. In Proc. ACM Symp. Principles of Database Systems, pages 
25-35, 1994. 




28 



L. Arge 



128. J. Robinson. The K-D-B tree: A search structure for large multidimensional 
dynamic indexes. In Proc. SIGMOD Inti. Conf. on Management of Data, pages 
10-18, 1981. 

129. N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In Proc. 
SIGMOD Inti. Gonf. on Management of Data, pages 71-79, 1995. 

130. N. Roussopoulos and D. Leifker. Direct spatial search on pictorial databases 
using packed R-trees. In Proc. SIGMOD Inti. Gonf. on Management of Data, 
pages 17-31, 1985. 

131. C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE 
Gomputer, 27(3): 17-28, 1994. 

132. B. Salzberg and V. J. Tsotras. A comparison of access methods for time evolving 
data. AGM Gomputing Surveys, 31(2):158-221, 1999. 

133. H. Samet. Applications of Spatial Data Structures: Gomputer Graphics, Image 
Processing, and GIS. Addison Wesley, MA, 1990. 

134. H. Samet. The Design and Analyses of Spatial Data Structures. Addison Wesley, 
MA, 1990. 

135. V. Samoladas and D. Miranker. A lower bound theorem for indexing schemes and 
its application to multidimensional range queries. In Proc. AGM Symp. Principles 
of Database Systems, pages 44-51, 1998. 

136. P. Sanders. Fast priority queues for cached memory. In Proc. Workshop on 
Algorithm Engineering and Experimentation, LNGS 1619, pages 312-327, 1999. 

137. N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees. 
Gommunications of the AGM, 29:669-679, 1986. 

138. B. Seeger and H.-P. Kriegel. The buddy-tree: An efficient and robust access 
method for spatial data base systems. In Proc. International Gonf. on Very 
Large Databases, pages 590-601, 1990. 

139. T. Sellis, N. Roussopoulos, and C. Faloutsos. The R'^-tree: A dynamic index for 
multi-dimensional objects. In Proc. International Gonf. on Very Large Databases, 
pages 507-518, 1987. 

140. J. Snoeyink. Point location. In J. E. Goodman and J. O’Rourke, editors. Handbook 
of Discrete and Computational Geometry, chapter 30, pages 559-574. CRC Press 
LLC, Boca Raton, FL, 1997. 

141. S. Subramanian and S. Ramaswamy. The P-range tree: A new data structure for 
range searching in secondary memory. In Proc. AGM-SIAM Symp. on Discrete 
Algorithms, pages 378-387, 1995. 

142. J. D. Ullman and M. Yannakakis. The input/output complexity of transitive 
closure. Annals of Mathematics and Artificial Intellegence, 3:331-360, 1991. 

143. J. Vahrenhold and K. H. Hinrichs. Planar point-location for large data sets: To 
seek or not to seek. In Proc. Workshop on Algorithm Engineering, 2000. 

144. J. van den Bercken, B. Seeger, and P. Widmayer. A generic approach to bulk 
loading multidimensional index structures. In Proc. International Gonf. on Very 
Large Databases, pages 406-415, 1997. 

145. J. van den Bercken, B. Seeger, and P. Widmayer. A generic approach to processing 
non-equijoins. Technical Report 14, Philipps-Universitat Marburg, Fachbereich 
Matematik und Informatik, 1998. 

146. M. J. van Kreveld and M. H. Overmars. Divided fc-d trees. Algorithmica, 6:840- 
858, 1991. 

147. P. J. Varman and R. M. Verma. An efficient multiversion access structure. IEEE 
Transactions on Knowledge and Data Engineering, 9(3):391-409, 1997. 

148. D. E. Vengroff. A transparent parallel I/O environment. In Proc. DAGS Sympo- 
sium on Parallel Computation, 1994. 




External Memory Data Structures 



29 



149. D. E. Vengroff and J. S. Vitter. Efficient 3-D range searching in external memory. 
In Proc. ACM Symp. on Theory of Computation, pages 192-201, 1996. 

150. D. E. Vengroff and J. S. Vitter. I/O-efficient scientific computation using TPIE. 
In Proceedings of the Coddard Conference on Mass Storage Systems and Tech- 
nologies, NASA Conference Publication 3340, Volume II, pages 553-570, 1996. 

151. J. S. Vitter. External memory algorithms and data structures. In J. Abello and 
J. S. Vitter, editors. External Memory Algorithms and Visualization, pages 1-38. 
American Mathematical Society, DIMACS series in Discrete Mathematics and 
Theoretical Computer Science, 1999. 

152. J. S. Vitter. Online data structures in external memory. In Proc. Annual In- 
ternational Colloquium on Automata, Languages, and Programming, LNCS 1644, 
pages 119-133, 1999. 

153. J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory, I: Two-level 
memories. Algorithmica, 12(2-3):110-147, 1994. 

154. S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A. Lopez. Indexing the 
positions of continuously moving objects. In Proc. SIGMOD Inti. Conf. on Man- 
agement of Data, pages 331-342, 2000. 

155. O. Wolfson, A. P. Sistla, S. Chamberlain, and Y. Yesha. Updating and querying 
databases that track mobile units. Distributed and Parallel Databases, 7(3):257- 
287, 1999. 

156. N. Zeh. I/O-efficient planar separators and applications. Manuscript, 2001. 

157. D. Zhang, A. Markowetz, V. Tsotras, D. Gunopulos, and B. Seeger. Efficient 
computation of temporal aggregates with range predicates. In Proc. ACM Symp. 
Principles of Database Systems, pages 237-245, 2001. 




Some Algorithmic Problems in Large Networks 



Susanne Albers 

Dept, of Computer Science, Freiburg University, 79110 Freiburg, Germany 
albersSinf ormat ik . uni-f reiburg . de 



Abstract. We will investigate a number of algorithmic problems that 
arise in large networks such as the world-wide web. We will mostly con- 
centrate on the following problems. 

General Caching Problems: Caching is a very well-studied problem. Con- 
sider a two-level memory system consisting of a small fast memory, that 
can store up to k bits, and a large slow memory, that can store poten- 
tially infinitely many bits. The goal is to serve a sequence of memory 
accesses with low total cost. In standard caching it is assumed that all 
memory pages have a uniform size and a uniform fault cost. In a gen- 
eral caching problem, however, pages or documents have varying sizes 
and varying costs. This problem arises, among other places, in the cache 
design for networked file systems or the world- wide web. In the web, for 
instance, the transmission time of a web page depends on the location 
of the corresponding server and also on transient conditions such as the 
server load or network congestion. 

The offline variant of general caching problems is NP-hard. Irani [J] re- 
cently presented polynomial time algorithms that achieve an approxima- 
tion ratio of 0(log(fc/s)), where s is the size of the smallest document 
ever requested. We present polynomial time constant factor approxima- 
tion algorithms that use a small amount of additional memory. The val- 
ues of the approximation ratios depend on the exact cost model assumed. 
In the most general setting we achieve a (4 -|- e)-approximation, for any 
£ > 0. The main idea of our solutions is to formulate caching problems 
as integer programs and solve linear relaxations. A fractional solution is 
transformed into a feasible solution using a new rounding technique. Our 
results were published in 0. 

Management of persistent TCP connections: Communication between 
clients and servers in the web is performed using HTTP (Hyper Text 
Transfer Protocol), which in turn uses TCP (Transmission Control Pro- 
tocol) to transfer data. If data has to be transmitted between two network 
nodes, then there has to exist an open TCP connection between these 
two nodes. While the earlier HTTP/ 1.0 opened and closed a separate 
TCP connection for each transmission request, the new HTTP/1.1, per- 
mits persistent connections, i.e. a TCP connection may be kept open and 
idle until a new transmission request arrive or the connection is explicitly 
closed by the server or the client. The problem is to maintain a limited 
number of open TCP connections at each network node, not knowing 
which connections will be required in the future. 

Cohen et al. pWfH recently initiated the study of connection caching in 
the web and presented optimal competitive online algorithms assuming 
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that the establishment cost is uniform for all the connections. They also 
analyzed various models of communication. We investigate the setting 
where connections may incur different establishment costs and present 
online algorithms that achieve an optimal competitiveness. Our algo- 
rithms use extra communication between network nodes while managing 
open connections. We can develop refined algorithms that allow trade- 
offs between extra communication and competitive performance. We also 
consider problem extensions where connections may have time-out values 
or asymmetric establishment costs. The results appeared in JQ 

Dynamic TCP acknowledgment: Consider a sequence of data packets 
that arrives at a network node over an open TCP connection. The node 
has to acknowledge the receipt of the packets by sending acknowledg- 
ments to the sending site. Most implementations of TCP use some ac- 
knowledgment delay mechanism, i.e. multiple incoming data packets are 
acknowledged with a single acknowledgment. A reduction of the number 
of acknowledgments sent leads to a smaller network congestion and to 
a smaller overhead incurred in sending and receiving acknowledgments. 
On the other hand, by sending fewer acknowledgments, we increase the 
latency of the TCP connection. The goal is to acknowledge dynamically 
a sequence of data packets, that arrives over time, so that the number 
of acknowledgments and the acknowledgment delays for the individual 
packets is as small as possible. 

The study of dynamic TCP acknowledgment was initiated by Dooly et 
al. 0. They considered the objective function of minimizing the sum of 
the number of acknowledgments and the total acknowledgment delays 
for all the packets. They presented deterministic online algorithms that 
achieve an optimal competitive ratio of 2. Recently Karlin et al. 0 gave 
randomized algorithms that achieve a competitiveness of e/(e— 1) ~ 1.58. 
We consider a different objective function that penalizes long acknowl- 
edgment delays. In practice if a data packet is not acknowledged within 
a certain amount of time, the packet is resent by the sending site, which 
increases again the network congestion. We investigate an objective func- 
tion that minimizes the sum of the number of acknowledgments and the 
maximum delay that ever occurs for any of the data packets. We present 
optimal online algorithms |2|. 
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Abstract. We survey recent and not so recent results related to the 
computation of exact and approximate distances, and corresponding 
shortest, or almost shortest, paths in graphs. We consider many different 
settings and models and try to identify some remaining open problems. 



1 Introduction 

The problem of finding distances and shortest paths in graphs is one of the most 
basic, and most studied, problems in algorithmic graph theory. A great variety 
of intricate and elegant algorithms were developed for various versions of this 
problem. Nevertheless, some basic problems in this area of research are still open. 
In this short survey, I will try to outline the main results obtained, and mention 
some of the remaining open problems. 

The input to all versions of the problem is a graph G = {V, E). The graph G 
may be directed or undirected, and it may be weighted or unweighted. If the graph 
is weighted, then each edge e £ E has a weight, or length, w{e) attached to it. 
The edge weights are either arbitrary real numbers, or they may be integers. In 
either case, the weights may be nonnegative, or may allowed to be negative. 

We may be interested in the distances and shortest paths from a single source 
vertex s to all other vertices of the graph, this is known as the Single-Source 
Shortest Paths (SSSP) problem, or we may be interested in the distances and 
shortest paths between all pairs of vertices in the graph, this is known as the All- 
Pairs Shortest Paths (APSP) problem. We may insist on getting exact distances 
and genuine shortest paths, or we may be willing to settle for approximate dis- 
tances and almost shortest paths. The errors in the approximate distances that 
we are willing to accept may be of an additive or multiplicative nature. 

If we insist on explicitly obtaining the distances between any pair of vertices 
in the graph, then the size of the output is f2{n?), where n is the number of 
vertices of the graph. Perhaps this is not what we had in mind. We may be 
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Academy of Sciences and Humanities. 



F. Meyer auf der Heide (Ed.): ESA 2001, LNCS 2161, pp. 33-^^ 2001. 
© Springer- Verlag Berlin Heidelberg 2001 



34 



U. Zwick 



interested, for example, only in a concise implicit approximation to all the dis- 
tances. This may be achieved, for example, by finding a sparse subgraph that 
approximates all the distances in G. Such a subgraph is called a spanner. 

Finally, perhaps the implicit approximation of all the distances offered by 
spanners is not enough. We may want a concise representation of approximate 
distances, together with quick means of extracting these approximations when 
we need them. This leads us to the study of approximate distance oracles. 

This summarizes the different problems considered in this survey. We still 
need to specify, however, the computational models used. We use two differ- 
ent variants of the unit cost Random Access Machine (RAM) model (see Q). 
When the edge weights are real numbers, the only operations we allow on the 
edge weights, and the numbers derived from them, are addition and eomparison. 
These operations are assumed to take 0(1) time. No other operations on real 
numbers are allowed. We call this the addition- eomparison model. It is remi- 
niscent of the algebraic computation tree model (see, e.g., mi though we are 
counting all operations, not only those that manipulate weights, and want a sin- 
gle concise program that works for any input size. When the edge weights are 
integral, we adopt the word RAM model that opens the way for more varied 
algorithmic techniques. In this model, each word of memory is assumed to be 
rc-bit wide, capable of holding an integer in the range {—2“'“^, . . . — 1}. 

We assume that every distance in the graph fits into one machine word. We also 
assume that w > log n, so that, for example, the name of vertex can be stored in 
a single machine word. Other than that, no assumptions are made regarding the 
relation between n and w. We are allowed to perform additions, subtractions, 
comparisons, shifts, and various logical bit operations, on machine words. Each 
such operation takes only 0(1) time. (Surprisingly, shifts, ANDs, XORs, and the 
other such operations, that seem to have little to do with the problem of com- 
puting shortest paths, do speed up algorithms for the problem, though only by 
sub-logarithmic factors.) In some cases, we allow operations like multiplication, 
but generally, we try to avoid such non-AC° operations. See |2H1 for a further 
discussion of this model. 



Most of the algorithmic techniques used by the algorithms considered are 
combinatorial. There are, however, some relations between distance problems 
and matrix multiplication. Thus, some of the algorithms considered, mostly for 
the APSP problem, rely on fast matrix multiplication algorithms (see, e.g., 1 1 isj V 
Some of the algorithms considered are deterministic while others are randomized. 



There are many more interesting variants of problems related to distances 
and shortest paths that are not considered in this short survey. These include: 
Algorithms for restricted families of graphs, such as planar graphs (see, e.g., 
[I4;-1Y I ] 1 : Parallel algorithms (see, e.g., |4t)ll 311 t)] h Algorithms for dynamic ver- 
sions of the problems (see, e.g., EHEI) ; Routing problems (see, e.g., [20157173) 1: 
Geometrical problems involving distances (see, e.g., [2815415,'^ 1: and many more. 

Also, this short survey adopts a theoretical point of view. Problems involving 
distances and shortest paths in graphs are not only great mathematical problems, 
but are also very practical problems encountered in everyday life. Thus, there 
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is great interest in developing algorithms for these problems that work well in 
practice. This, however, is a topic for a different survey that I hope someone else 
would write. For some discussion of practical issues see, e.g., cma. 

2 Basic Definitions 

Let G = (V,E) be a graph. We let \V\ = n and \E\ = m. We always assume 
that m > n. The distance 6{u, v) from u to u in the graph is the smallest length 
of a path from u to u in the graph, where the length of a path is the sum of 
the weights of the edges along it. If the graph is unweighted then the weight of 
each edge is taken to be 1. If the graph is directed, then the paths considered 
should be directed. If there is no (directed) path from u to u in the graph, we 
define 6(u,v) = +oo. If all the edge weights are nonnegative, then all distances 
are well defined. If the graph contains (directed) cycles of negative weight, and 
there is a path from u to u that passes through such a negative cycle, we let 
S{u, v) = —oo. 

Shortest paths from a source vertex s to all other vertices of the graph can 
be compactly represented using a tree of shortest paths. This is a tree, rooted 
at s, that spans all the vertices reachable from s in the graph, such that for 
every vertex v reachable from s in the graph, the unique path in the tree from s 
to u is a shortest path from s to u in the graph. Almost all the algorithms we 
discuss return such a tree (or a tree of almost shortest paths), or some similar 
representation. In most cases, producing such a tree is straightforward. In other 
cases, doing so without substantially increasing the running time of the algorithm 
is a non-trivial task. Due to lack of space, we concentrate here on the computation 
of exact or approximate distances, and only briefly mention issues related to the 
generation of a representation of the corresponding paths. 



3 Single-Source Shortest Paths 

We begin with the single-source shortest paths problem. The input is a graph 
G = {V, E) and a source s €V. The goal is to compute all the distances 5{s, v), 
for V £ V, and construct a corresponding shortest paths tree. The following 
subsections consider various versions of this problem. 



3.1 Nonnegative Real Edge Weights 

If the input graph G = {V, E) is unweighted, then the problem is easily solved 
in 0{m) time using Breadth First Search (BFS) (see, e.g., [El)- We suppose, 
therefore, that each edge e £ E has a nonnegative real edge weight w{e) > 0 
associated with it. The problem can then be solved using the classical Dijkstra’s 
algorithm m For each vertex of G, we hold a tentative distance d{v). Initially 
d{s) = 0, and d{v) = -koo, for every v £ E — {s}. We also keep a set T of unsettled 
vertices. Initially T = V. At each stage we choose an unsettled vertex u with 
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minimum tentative distance, make it settled, and explore the edges emanating 
from it. If {u,v) G E, and v is still unsettled, we update the tentative distance 
of V as follows: d{v) G- min{d(u), ^(m) + w{u, w)}. This goes on until all vertices 
are settled. It is not difficult to prove that when a vertex u becomes settled we 
have d{u) = S{s,u), and that d{u) would not change again. 

An efficient implementation of Dijkstra’s algorithm uses a priority queue to 
hold the unsettled vertices. The key associated with each unsettled vertex is 
its tentative distance. Vertices are inserted into the priority queue using insert 
operations. An unsettled vertex with minimum tentative distance is obtained 
using an extract-min operation. Tentative distances are updated using decrease- 
key operations. 

A simple heap-based priority queue can perform insertions, extract-min and 
decrease-key operations in O(logn) worst case time per operation. This gives 
immediately an 0(m log n) time SSSP algorithm. Fibonacci heaps of Fredman 
and Tarjan PD] require only 0(1) amortized time per insert and decrease- key 
operation, and O(logn) amortized time per extract-min operation. This gives an 
0(m -I- n log n) time SSSP algorithm. {Relaxed heaps of Driscoll et al. |22j may 
also be used to obtain this result. They require 0(1) worst case (not amortized) 
time for per decrease-key operation, and 0(log n) worst case time per extract-min 
operation.) The only operations performed by these algorithms on edge weights 
are additions and comparisons. Furthermore, every sum of weights computed by 
these algorithms is the length of a path in the graph. The 0{m -|- nlogn) time 
algorithm is the fastest known algorithm in this model. 

Dijkstra’s algorithm produces the distances 5(s, v), for v gV ,'va sorted order. 
It is clear that this requires, in the worst case, l7(nlogn) comparisons. (To sort n 
elements, form a star with n leaves and attach the elements to be sorted as 
weights to the edges.) However, the definition of the SSSP problem does not 
require the distances to be returned in sorted order. This leads us to our first 
open problem: Is there an algorithm for the SSSP problem in the addition- 
comparison model that beats the information theoretic f2 {nlogn) lower bound 
for sorting? Is there such an algorithm in the algebraic computation tree model? 



3.2 Nonnegative Integer Edge Weights — Directed Graphs 

We consider again the single-source shortest paths problem with nonnegative 
edge weights. This time we assume, however, that the edge weights are integral 
and that we can do more than just add and compare weights. This leads to 
improved running times. (The improvements obtained are sub-logarithmic, as 
the running time of Dijkstra’s algorithm in the addition-comparison model is 
already almost linear.) 

The fact that the edge weights are now integral opens up new possibilities. 
Some of the techniques that can be applied are scaling, bucketing, hashing, bit- 
level parallelism, and more. It is also possible to tabulate solutions of small 
subproblems. The description of these techniques is beyond the scope of this 
survey. We merely try to state the currently best available results. 



Exact and Approximate Distances in Graphs - A Survey 



37 



Most of the improved results for the SSSP problem in the word RAM model 
are obtained by constructing improved priority queues. Some of the pioneering 
results here were obtained by van Emde Boas et al. I7IT75I and Fredman and 
Willard mr^ . It is enough, in fact, to construct monotone priority queues, 
i.e., priority queues that are only required to support sequences of operations in 
which the value of the minimum key never decreases. 

Thorup m describes a priority queue with O(loglogn) expected time per 
operation. This gives immediately an 0(m log log n) expected time algorithm for 
the SSSP problem. (Randomization is needed here, and in most other algorithms, 
to reduce the work space needed to linear.) He also shows that the SSSP problem 
is not harder than the problem of sorting the m edge weights. Han describes a 
deterministic sorting algorithm that runs in 0(n log log n log log log n) time. This 
gives a deterministic, 0(m log log n log log log n) time, linear space algorithm for 
the SSSP problem. Han’s algorithm uses multiplication. Thorup describes 
a deterministic, 0(n(loglogn)^) time, linear space sorting algorithm that does 
not use multiplication, yielding a corresponding SSSP algorithm. 

Improved results for graphs that are not too sparse may be obtained by 
constructing (monotone) priority queues with constant (amortized) time per 
decrease-key operation. Thorup cni uses this approach to obtain an 0{m + 
(nlogn)/w^/^“'^) expected time algorithm, for any e > 0. (Recall that w is the 
width of the machine word.) Raman |E3, building on results of Ahuja et al. ^ 
and Cherkassky et al. m, obtains an + expected time algorithm, 

for any e > 0. Note that the first algorithm is fast when w is large, while the 
second algorithm is fast when w is small. By combining these algorithms, Raman 
|)l| obtains an 0{m + n(log n)^^^^*^) expected time algorithm, for any e > 0. 
Building on his results from m, he also obtains deterministic algorithms with 
running times of 0{m + n{w log and 0(m -I- n(lognloglogn)^/^). 

It is interesting to note that w may be replaced in the running times above 
by log C, where C is the largest edge weight in the graph. 

Finally, Hagerup EOl, extending a technique of Thorup EH for undirected 
graphs, obtains a deterministic 0{mlogw) time algorithm. 

Is there a linear time algorithm for the directed SSSP problem in the word 
RAM model? Note that a linear time sorting algorithm would give an affirmative 
answer to this question. But, the SSSP problem may be easier than sorting. 

3.3 Nonnegative Integer Edge Weights — Undirected Graphs 

All the improved algorithms mentioned above (with one exception) are ‘just’ 
intricate implementations of Dijkstra’s algorithm. They produce, therefore, a 
sorted list of the distances and do not avoid, therefore, the sorting bottleneck. 

In a sharp contract, Thorup |t)SIB9] developed recently an elegant algorithm 
that avoids the rigid settling order of Dijkstra’s algorithm. Thorup’s algorithm 
bypasses the sorting bottleneck and runs in optimal 0(m) time! His algorithm 
works, however, only on undirected graphs. It remains an open problem whether 
extensions of his ideas could could be used to obtain a similar result for directed 
graphs. (Some results along these lines were obtained by Hagerup EDI-) 
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3.4 Positive and Negative Real Edge Weights 

We now allow, for the first time, negative edge weights. The gap, here, between 
the best upper bound and the obvious lower bound is much wider. 

Dijkstra’s algorithm breaks down in the presence of negative edge weights. 
The best algorithm known for the problem in the addition-comparison model 
is the simple 0{mn) time Bellman-Ford algorithm (see, e.g., Start again 

with d{s) = 0 and d{v) = -l-oo, for v G V — {s}. Then, perform the following n 
times: For every edge {u,v) G E, let d{v) G- min{<i(t:), <i(u) -I- w(m,v)}. If any of 
the tentative distances change during the last iteration, then the graph contains 
a negative cycle. Otherwise, d{v) = (5(s,u), for every v gV. 

The problem of deciding whether a graph contains a negative cycles is a 
special case of the problem for finding a minimum mean weight cycle in a graph. 
Karp m gives an 0(rnn) time algorithm for this problem. 

Is there a o{mn) time algorithm for the single-source shortest paths problem 
with positive and negative weights in the addition-comparison model? 

3.5 Positive and Negative Integer Edge Weights 

Goldberg |3B|, improving results of Gabow [23 and of Gabow and Tarjan [33? 
uses scaling to obtain an log N) time algorithm for this version of the 

problem, where N is the absolute value of the smallest edge weight. (It is assumed 
that N > 2.) Goldberg’s algorithm, like most algorithms dealing with negative 
edge weights, uses potentials (see Section lOD. No progress on the problem was 
made in recent years. Is there a better algorithm? 

4 All-Pairs Shortest Paths — Exact Results 

We now move to consider the all-pairs shortest paths problem. The input is 
again a graph G = {V,E). The required output is a matrix holding all the 
distances S(u,v), for u,v G V, and some concise representation of all shortest 
paths, possibly a shortest path tree rooted at each vertex. 

4.1 Nonnegative Real Edge Weights 

If all the edge weights are nonnegative, we can simply run Dijkstra’s algorithm 
independently from each vertex. The running time would be 0{mn + logn). 
Karger, Koller and Phillips m and McGeoch m note that by orchestrat- 
ing the operation of these n Dijkstra’s processes, some unnecessary operations 
may saved. This leads to an 0{m*n + logn) time algorithm for the problem, 
where m* is the number of essential edges in G, i.e., the number of edges that 
actually participate in shortest paths. In the worst case, however, m* = m. 

Karger et al. m also introduce the notion of path-forming algorithms. These 
are algorithms that work in the addition-comparison model, with the additional 
requirement that any sum of weights computed by the algorithm is the length 
of some path in the graph. (All the addition-comparison algorithms mentioned 
so far satisfy this requirement.) They show that any path-forming algorithm for 
the APSP problem must perform, in the worst case, Q(mn) operations. 
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4.2 Positive and Negative Real Edge Weights 

When some of the edge weights are negative, Dijkstra’s algorithm cannot be used 
directly. However, Johnson observed that if there are no negative weight 
cycles, then new nonnegative edge weights that preserve shortest paths can be 
computed in 0{mn) time. The idea is very simple. Assign each vertex v G V a, 
potential p{v). Define new edge weights as follows Wp{u, v) = w{u, v)+p{u)—p{v), 
for every (u, v) G E. It is easy to verify that the new distances satisfy 5p(u, v) = 
S{u,v) + p{u) — p{v), for every u,v gV. (The potentials along any path from u 
to V cancel out, except those of u and v.) Thus, the shortest paths with respect 
to the new edge weights are also shortest paths with respect to the original 
edge weights, and the original distances can be easily extracted. Now, add to G 
a new vertex s, and add zero weight edges from it to all other vertices of the 
graph. Let p{v) = 6{s,v). If there are no negative weight cycles in the graph 
then these distances are well defined and they could be found in 0{mn) time 
using the Bellman-Ford algorithm. The triangle inequality immediately implies 
that Wp{u, v) > 0, for every u,v G V. Now we can run Dijkstra’s algorithm from 
each vertex with the new nonnegative weights. The total running time is again 
0{mn + n^ logn). 

As m may be as high as f2(n^), the running time of Johnson’s algorithm may 
be as high as I7(n^). A running time of 0{n^) can also be achieved using the 
simple Floyd- Warshall algorithm (see ^Hl)- We next consider the possibility of 
obtaining faster algorithms for dense graphs. 

The all-pairs shortest paths problem is closely related to the {min, -|-}- 
product of matrices. If A = (oy) and B = (bij) are n x n matrices, we let 
A-kB be the n X n matrix whose (i,j)-th element is (A-kB)ij = mink{aik + bkj}- 
We refer to A* B as the distance product of A and B. 

Let G = (V,E) be a graph. We may assume that V = |l,2,...,n}. Let 
W = (wij) be an n X n matrix with Wij = w{i,j), if (b j) S E, and Wij = -l-oo, 
otherwise. It is easy to see that VF", where the exponentiation is done with 
respect to distance product, gives the distance between any pair of vertices in 
the graph. Furthermore, the graph contains a negative cycle if and only if there 
are negative elements on the diagonal of IT". Thus, the APSP problem can be 
easily solved using O(logn) distance products. In fact, under some reasonable 
assumptions, this logarithmic factor can be saved, and it can be shown that the 
APSP problem is not harder than the problem of computing a single distance 
product of two n X n matrices (see PJ Theorem 5.7 on p. 204]). 

Distance products could be computed naively in O(n^) time, but this is of 
no help to us. Algebraic, i.e., {-k, x {-products of matrices could be computed 
much faster. Strassen m was the first to show that it could be done using 
o{n^) operations. Many improvements followed. We let uj be the exponent of 
matrix multiplication, i.e., the smallest constant for which matrix multiplication 
can be performed using only 0(n“+°(^^) algebraic operations, i.e., additions, 
subtractions, and multiplications. (For brevity, we ‘forget’ the annoying o(l) 
term, and use w as a substitute for uj + o(l).) Coppersmith and Winograd CHI 
showed that uj < 2.376. The only known lower bound on uj is the trivial lower 
bound UJ > 2. 
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Could similar techniques be used, directly, to obtain o(n^) algorithms for 
computing distance products? Unfortunately not. The fast matrix multiplica- 
tion algorithms rely in an essential way on the fact that addition operations 
could be reversed, via subtractions. This opens the way for clever cancellations 
that speed up the computation. It is known in fact, that matrix multiplication 
requires [2(nP) operations, if only additions and multiplications are allowed. This 
follows from lower bounds on monotone circuits for Boolean matrix multiplication 
obtained by Mehlhorn and Galil and by Paterson m- 

Yuval ESI describes a simple transformation from distance products to stan- 
dard algebraic products. He assumes, however, that exact exponentiations and 
logarithms of infinite precision real numbers could be computed in constant time. 
His model, therefore, is very unrealistic. His ideas could be exploited, however, in 
a more restricted form, as would be mentioned in Section E^ (Several erroneous 
follow-ups of Yuval’s result appeared in the 80’s. They are not cited here.) 

Fredman m describes an elegant way of computing distance products of 
two n X n matrices using 0(n®/^) additions and comparisons in the algebraic 
computation tree model. There does not seem to be any an efficient way of 
implementing his algorithm in the RAM model, as it require programs of ex- 
ponential size. However, by running his algorithm on small matrices, for which 
short enough programs that implement his algorithm could be precomputed, 
he obtains an 0(n^(log log n/ log time algorithm for computing distance 

products. Takaoka ins! slightly improves his bound to 0(n^(loglogn/logn)^/^). 

Is there a genuinely sub-cubic algorithm for the APSP problem in the 
addition-comparison model, i.e., an algorithm that runs in time, for 

some e > 0? 



4.3 Integer Edge Weights — Directed Graphs 

We next consider the all-pairs shortest paths problem in directed graphs with 
integer edge weights. Even the unweighted case is interesting here. The first to ob- 
tain a genuinely sub-cubic algorithm for the unweighted problem were Alon, Galil 
and Margalit Their algorithm runs in time0Their result also ex- 

tends to the case in which the edge weights are in the range {0, . . . , M}. The run- 
ning time of their algorithm is then if M < 

and if M > (see Galil and Margalit (3tl3/j 1 . 

Takaoka m obtained an algorithm whose running time is The 

bound of Takaoka is better than the bound of Alon et al. P) for larger values 
of M. The running time of Takaoka’s algorithm is sub-cubic for M < . 

The algorithms of Galil and Margalit Pl?7| and of Takaoka m were im- 
proved by Zwick f77T7?^ . Furthermore, his algorithm works with edge weights to 
be in the range {—M, . . . ,M}. The improvement is based on two ingredients. 
The first is an 0{Mn'^) algorithm, mentioned in for computing distance 

^ We use 0{f) as a shorthand for /• (logn)*^^) ^ Jjj the SSSP problem we are hghting 
to shave off sub-logarithmic factors. In the APSP problem the real battle is still 
over the right exponent of n, so we use the O(-) notation to hide not so interesting 
polylogarithmic factors. 
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products of nxn matrices whose finite elements are in the range {—M, . . . ,M}. 
This algorithm is based on the idea of Yuval m- It is implemented this time, 
however, in a realistic model. The algorithm uses both the fast matrix multipli- 
cation algorithm of HS|, and the integer multiplication algorithm of lOIJ. (Note 
that an 0{Mn'^) time algorithm for distance products does not give immedi- 
ately an 0{Mn‘^) time algorithm for the APSP problem, as the range of the 
elements is increased by each distance product.) The second ingredient is a sam- 
pling technique that enables the replacement of a distance product of two n x n 
matrices by a smaller reetangular product. The algorithm uses, therefore, the 
fast rectangular matrix multiplication algorithm of [m (see also ISI). 

To state the running time of Zwick’s algorithm, we need to introduce ex- 
ponents for rectangular matrix multiplication. Let w(r) be the smallest con- 
stant such that the product of an n x n’" matrix by an n’’ x n matrix could 
be computed using algebraic operations. Suppose that M = n‘. 

Then, the running time of his algorithm his where fj, = satisfies 

w(^) = 1 -I- 2^ — t. The best available bounds on w(r) imply, for example, that 
fj,{0) < 0.575, so that the APSP problem for directed graphs with edge weights 
taken from {—1,0,1} can be solved in 0(n^'®^®) time. The algorithm runs in 
sub-cubic time when M < as was the case with Takaoka’s algorithm. 

The algorithms mentioned above differ from almost all the other algorithms 
mentioned in this survey in that augmenting them to produce a compact rep- 
resentation of shortest paths, and not only distances, is a non-trivial task. This 
requires the computation of witnesses for Boolean matrix multiplications and 
distance products. A simple randomized algorithm for computing witnesses for 
Boolean matrix multiplication is given by Seidel pi2] . His algorithm was deran- 
domized by Alon and Naor j2| (see also jS]). An alternative, somewhat slower 
deterministic algorithm was given by Galil and Margalit inni. 

Obtaining improved algorithms, and in particular sub-cubic algorithms for 
larger values of M for this version of the problem is a challenging open problem. 

Finally, Hagerup m obtained an 0(rrm -I- n log log n) time algorithm for the 
problem in the word RAM model. Could this be reduced to 0{mn)l 



4.4 Integer Edge Weights Undirected Graphs 

Galil and Marsalit [.'ftil.'lTj and Seidel obtained 0{n^) time algorithms for 
solving the APSP problem for unweighted undireeted graphs. Seidel’s algorithm 
is much simpler. Both algorithms show, in fact, that this version of the problem is 
harder then the Boolean matrix multiplication problem by at most a logarithmic 
factor. (Seidel’s algorithm, as it appears in jB2j, uses integer matrix products, 
but it is not difficult to obtain a version of it that uses only Boolean products.) 
Again, witnesses for Boolean matrix products are needed, if paths, and not only 
distances, are to be found. 

Seidel’s algorithm is extremely simple and elegant. There seems to be no sim- 
ple way, however, of using his ideas to obtain a similar algorithm for weighted 
graphs. The algorithm of Galil and Margalit can be extended, in a fairly straight- 
forward way, to handle small integer edge weights. The running time of their al- 
gorithm, when the edge weights are taken from {0,1,..., M|, is 
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An improved time bound of 0{Mn^) for the problem was recently obtained by 
Shoshan and Zwick m- They show, in fact, that the APSP problem for undi- 
rected graphs with edge weights taken from {0,1,..., M} is harder than the 
problem of computing the distance product of two n x n matrices with elements 
taken from the same range by at most a logarithmic factor. (As mentioned in 
Section El this is not known for directed graphs.) 

Obtaining improved algorithms, and in particular sub-cubic algorithms for 
larger values of M for this version of the problem is again a challenging open 
problem. For undirected graphs this is equivalent, as mentioned, to obtaining 
faster algorithms for distance products of matrices with elements in the range 
{0,1,..., M}. 

5 All-Pairs Shortest Paths — Approximate Results 

The cost of exactly computing all distances in a graph may be prohibitively 
large. In this section we explore the savings that may be obtained by settling 
for approximate distances and almost shortest paths. Throughout this section, 
we assume that the edge weights are nonnegative. 

We say that an estimated distance S{u, v) is of stretch t if and only if S{u, v) < 
S{u,v) < t-6{u,v). We say that an estimated distance 5{u,v) is of surplus t if 
and only if S{u, v) < 6{u, v) < S{u, v) + 1. All our estimates correspond to actual 
paths in the graph, and are thus upper bounds on the actual distances. 

5.1 Directed Graphs 

It is not difficult to see m that for any finite t, obtaining stretch t estimates 
of all distances in a graph is at least as hard as Boolean matrix multiplication. 
On the other hand, Zwick IZZI shows that for any e > 0, approximate distances 
of stretch 1 -|- e of all distances in a directed graph may be computed in time 
0((n“/e) log(W/e)), where W is the largest edge weight in the graph, after the 
edge weights are scaled so that the smallest nonzero edge weight is 1. 

5.2 Unweighted Undirected Graphs 

Surprisingly, perhaps, when the graph is undirected and unweighted, estimated 
distance with small additive error may be computed rather quickly, without using 
fast matrix multiplication algorithms. This was first shown by Aingworth et al. 
P]. They showed that surplus 2 estimates of the distances between k specified 
pairs of vertices may be computed in 0(n^/^(fclogn)^/^) time. In particular, 
surplus 2 estimates of all the distances in the graph, and corresponding paths, 
may be computed in 0(n^/^(log time. Aingworth et al. |3| also give a 
2/3-approximation algorithm for the diameter of a weighted directed graph that 
runs in 0(m(n log -|- log n) time. 

Elkin |25j describes an algorithm for computing estimated distances from a 
set S of sources to all other vertices of the graph. He shows that for any e > 0, 
there is a constant h = &(e), such that an estimated distance 5{u,v) satisfying 
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5{u, v) < 6{u, v) < (l+e)5(u, f ) + 6, for every u € S and v € V, may be computed 
in 0{mrf + time. Furthermore, the corresponding shortest paths, use 

only 0(n^+'^) edges of the graph. (See also Section 16.21 1 Note, however, that 
although the term multiplying S{u,v) above can be made arbitrarily close to 1, 
the errors in the estimates obtained are not purely additive. 

Dor et al. |23] obtained improved algorithms for obtaining finite surplus es- 
timates of all distances in the graph. They show that surplus 2 estimates may 
be computed in time, and also in time. Furthermore, they 

exhibit a surplus-time tradeoff showing that surplus 2(fc— 1) estimates of all dis- 
tances may be computed in time. In particular, surplus 0(log n) 

estimates of all distances may be obtained in almost optimal 0{n^) time. 



5.3 Weighted Undirected Graphs 

Cohen and Zwick HS| adapted the techniques of Dor et al. 123] for weighted 
graphs. They obtain stretch 2 estimates of all distances in time, 

stretch 7/3 estimates in time, and stretch 3 estimates, and corresponding 

stretch 3 paths, in almost optimal O(n^) time. Algorithms with additive errors 
are also presented. They show, for example, that if p is a any path between u 
and u, then the estimate 5{u,v) produced by the time algorithm 

satisfies 5{u, v) < w{p) + 2uiniax(p), where w{p) is the length of p, and Wmax(7') 
is the weight of the heaviest edge on p. 



6 Spanners 

6.1 Weighted Undirected Graphs 

Let G = (V, E) be an undirected graph. In many applications, many of them 
related to distributed computing (see El), it is desired to obtain a sparse sub- 
graph H = (y, F) of G that approximates, at least to a certain extent, the 
distances in G. Such a subgraph H is said to be a t-spanner of G if and only if 
for every u,v €V we have Sh(u,v) < t-Sc{u,v). (This definition, implicit in |E|, 
appears explicitly in m-) 

Althofer et al. |7| describe the following simple algorithm for constructing 
a t-spanner of an undirected graph G = {V,E) with nonnegative edge weights. 
The algorithm is similar to Kruskal’s algorithm m (see also EH]) for computing 
minimum spanning trees: Let F <j). Consider the edges of G in nondecreasing 
order of weight. If (u, u) € if is the currently considered edge and w{u,v) < 
t-6F{u,v), then add (u,v) to F. It is easy to see that at the end of this process 
H — (y, F) is indeed a t-spanner of G. It is also easy to see that the girth of H 
is greater than t -I- 1. (The girth of a graph G is the smallest number of edges 
on a cycle in G.) It is known that any graph with at least edges contains 

a cycle with at most 2k edges. It follows that any weighted graph on n vertices 
has a {2k — l)-spanner with edges. This result is believed to be tight 

for any fc > 1. It is proved, however, only for k = 1, 2, 3 and 5. (See, e.g., [Z2|-) 
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The fastest known implementation of the algorithm of Althofer et al. |J| runs 
in time. If the graph is unweighted, then a {2k — l)-spanner of 

size can be easily found in 0{m) time EH- Thorup and Zwick ca, 

improving a result of Cohen m, give a randomized algorithm for computing a 
{2k — l)-spanner of size in 0{kmv}^^) expected time. 

Approximation algorithms and hardness results related to spanners were ob- 
tained by Kortsarz and Peleg m and Elkin and Peleg |2Z|. 

6.2 Unweighted Undirected Graphs 

Following Elkin and Peleg 1201, we say that a subgraph H of an unweighted 
graph G is an (a, 6)-spanner of G if and only if 5h{u, v) < a-S{u, v) + b, for every 
u,v G V. (For a related notion of ^-emulators, see |2S|.) Elkin and Peleg 
and Elkin EOl, improving and extending some preliminary results of Dor et al. 
m, show that any graph on n vertices has a (1, 2)-spanner with edges, 

and that for any e > 0 and <5 > 0 there exists b — b{e, i5), such that every graph 
on n vertices has a (1 -I- e, 5)-spanner with 0{n^'^^) edges. 

The intriguing open problem here is whether the (1 -I- e, 6)-spanners of j26p‘25) 
could be turned into (1, 6)-spanners, i.e., purely additive spanners. In particular, 
it is still open whether there exists a 5 > 0 such that any graph on n vertices 
has a (1, &)-spanner with o(n^/^) edges. (In it is shown that any graph on n 
vertices has a Steiner (1, 4)-spanner with 0(n^/^) edges. A Steiner spanner, 
unlike standard spanners, is not necessarily a subgraph of the approximated 
graph. Furthermore, the edges of the Steiner spanner may be weighted, even if 
the original graph is unweighted.) 



7 Distance Oracles 

In this section we consider the following problem: We are given a graph G = 
{V,E). We would like to preprocess it so that subsequent distance queries or 
shortest path queries could be answered very quickly. A naive solution is to solve 
the APSP problem, using the best available algorithm, and store the nxn matrix 
of distances. Each distance query can then be answered in constant time. The 
obvious drawbacks of this solution are the large preprocessing time and large 
space requirements. Much better solutions exist when the graph is undirected, 
and when we are willing to settle for approximate results. 

The term approximate distance oracles is coined in Thorup and Zwick m, 
though the problem was considered previously by Awerbuch et al. jHj, Cohen PI 
and Dor et al. Improving the results of these authors, Thorup and Zwick P! 
show that for any fc > 1, a graph G = {V,E) on n vertices can be preprocessed 
in 0{kmn^/^) expected time, constructing a data structure of size 0{kn^~^'^/^), 
such that a stretch 2fc — 1 answer to any distance query can be produced in 
0{k) time. The space requirements of these approximate distance oracles are 
optimal for fc = 1, 2, 3, 5, and are conjectured to be optimal for any value of k. 
(This is related to the conjecture regarding the size of {2k — l)-spanners made 
in Section im See discussion in na.) 
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Many open problems still remain regarding the possible tradeoffs between the 
preprocessing time, space requirement, query answering time, and the obtained 
stretch of approximate distance oracles. In particular, is it possible to combine 
the techniques of m and IZ21 to obtain a stretch 3 distance oracle with 0{n^) 
preprocessing time, space, and constant query time? Which tradeoffs 

are possible when no randomization is allowed? Finally, all the distance ora- 
cles currently available have multiplicative errors. Are there non-trivial distance 
oracles with additive errors? 
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Abstract. Prefetching and caching are widely used for improving the 
performance of file systems. Recent studies have shown that it is impor- 
tant to integrate the two. In this model we consider the following problem. 
Suppose that a program makes a sequence of m accesses to data blocks. 
The cache can hold k blocks, where k < m. An access to a block in the 
cache incurs one time unit, and fetching a missing block incurs d time 
units. A fetch of a new block can be initiated while a previous fetch is in 
progress. Thus, d block fetches can be in progress simultaneously. The 
locality of references to the cache is captured by the access graph model 
of 1^. The goal is to find a policy for prefetching and caching, which 
minimizes the overall execution time of a given reference sequence. This 
problem is called caching with locality and pipelined prefetching ( CLPP). 
Our study is motivated from the pipelined operation of modern mem- 
ory controllers, and program execution on fast processors. For the offline 
case we show that an algorithm introduced in Q is optimal. In the on- 
line case we give an algorithm which is within factor of 2 from the opti- 
mal in the set of online deterministic algorithms, for any access graph, 
and k,d > 1. Improved ratios are obtained for several important classes 
of access graphs, including complete graphs and directed acyclic graphs 
(DAG). Finally, we study the CLPP problem assuming a Markovian ac- 
cess model, on branch trees, which often arise in applications. We give 
algorithms whose expected performance ratios are within factor 2 from 
the optimal. 



1 Introduction 

1.1 Problem Statement 

Caching and prefetching have been studied extensively in the past decades; how- 
ever, the interaction between the two was not well understood until the impor- 
tant work of Cao et. al who proposed to integrate caching with prefetching. 
They introduced the following execution model. Suppose that a program makes 
a sequence of m accesses to data blocks and the cache can hold k < m blocks. 
An access to a block in the cache incurs one time unit, and fetching a missing 
block incurs d time units. While accessing a block in the cache, the system can 
fetch a block from secondary storage, either in response to a cache miss {caching 
by demand), or before it is referenced, in anticipation of a miss {pre fetching); at 
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most one fetch can be in progress at any given time. The Caching with Prefetch- 
ing (CP) problem is to determine the sequence of block evictions/prefetches, so 
as to minimize the overall time required for accessing all blocks. 

Motivated from the operation of modern memory controllers, and from pro- 
gram execution on fast processors^ we consider the problem of caching inte- 
grated with pipelined prefetching. Here, a fetch of a new block can be initiated 
while a previous fetch is in progress. Thus, d block fetches can be in progress 
simultaneously. We adopt the access graph model developed by Borodin et al. |2|, 
which captures locality of reference in memory access patterns of real programs; 
thus, we assume, that any sequence of block references is a walk on an access 
graph G. As before, our measure is the overall execution time of a given reference 
sequence. 

Formally, suppose that a set of n data blocks 5i , 62 , ...,&« is held in secondary 
storage. The access graph for the program that reads/ writes into 61 , 62 , . . . , 6 „, 
is given by a directed graph G = {V,E), where each node corresponds to a block 
in this set. Any sequence of block references has to obey the locality constraints 
imposed by the edges of G: following a request to a block (node) it, the next 
request has to be either to block u or to a block u, such that (u, v) G E. The 
usage of pipelined prefetching implies that if a prefetch is initiated at time t, 
then the next prefetch can be initiated at time (t -I- 1). 

Given a reference sequence a = {ri, . . . , r^}, G can be satisfied immediately 
at time t, incurring one time unit, if is in the cache; otherwise, if a prefetch 
of ri was initiated at time ti < t, there is a stall for d{ri) = d — (t — U) time 
units. The total execution time of a is the time to access the m blocks plus the 
stall time, i.e., G(a) = m d{ri). The problem of Caching with Locality 

and Pipelined Prefetching ( CLPP) can be stated as follows. Given a cache of size 
/c > 1 , a delivery time d > 1 , and a reference sequence cr = {ri, . . . , r^}, find a 
policy for pipelined prefetching and caching that minimizes C{a). 
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Fig. 1. An example of caching (fc = 3 and d = 2) 



Example 1. Gonsider a program whose block accesses are given by the sequence 
“DEBGA” (the entire sequence is known at time 0). The cache can hold three 



^ A detailed survey of these applications is given in 0. 
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Fig. 2. An example of caching with pipelined prefetching, using Algorithm AGG (fc = 3 
and d — 2) 



blocks; fetching a block takes two time units. Initially A,B and C are in the cache. 
Figure Q shows the execution of the optimal caching-by-demand algorithm 
that incurs 11 time unitfl Figure |2| shows an execution of the Aggressive (AGG) 
algorithm (see Section which combines caching with pipelined prefetching; 
thus, the reference sequence is completed within 7 time units, which is optimal. 

The above example suggests that pipelined prefetching can be helpful. When 
future accesses are fully known, the challenge of a good algorithm is to achieve 
maximum overlap between accesses to blocks in the cache and fetches. Without 
this knowledge, achieving maximum overlap may be harmful, due to evictions 
of blocks that will be requested in the near future. Thus, more careful decisions 
need to be made, based on past accesses, and the structure of the underlying 
access graph. 



1.2 Our Results 

We study the CLPP problem both in the ojfline case, where the sequence of cache 
accesses is known in advance, and in the online case, where r^+i is revealed to the 
algorithm only when is accessed. In the offline case (Section |2I), we show that 
algorithm Aggressive (AGG) introduced in ^ is optimal, for any access graph 
and any d,k > 1 . 

In the online case (Section we give an algorithm which is within factor 
of 2 from the optimal in the set of online deterministic algorithms, for any 
access graph, and k,d > 1. Improved ratios are obtained (in Section 0 for 
several important classes of access graphs, including complete graphs and directed 
acyclic graphs (DAG). In particular, for complete graphs we obtain a ratio of 
1 -I- 2/fc, for DAGs min(2, 1 -|- k/d), and for branch trees 1 -I- o(l). Finally, we 
study (in Section 101 ) the CLPP problem assuming a Markovian access model, 

^ Note that if a block bi is replaced in order to bring the block bj, then bi becomes 
unavailable for access when the fetch is initiated; bj can be accessed when the fetch 
terminates, i.e., after d time units. 
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on branch trees, which often arise in applications. We give algorithms whose 
expected performance ratios are within factor 2 from the optimal for general 
trees, and (1 + o(l)) for homogeneous trees. 

Our results contain two technical contributions. We present (in Section^ a 
general proof technique, for deriving upper bounds on the competitive ratios of 
online algorithms for our problem. (The technique is used also for deriving the 
results in Sections 0 and 0 . The technique relies on comparing a lazy version 
of a given online algorithm to an optimal algorithm, which is allowed to use 
parallel (rather than pipelined) prefetches; it may be useful for other problems 
that involve online pipelined service of input sequences. Our second contribution 
is an asymptotic analysis of the g-distance Fibonacci numbers, for any q > 2. 
We use asymptotic estimates of these numbers for solving the recursion formula 
for algorithm DEE, in the Markovian model (see Section l(T!^ . 



1.3 Related Work 

The concept of cooperative prefetching and caching was first investigated by 
Cao et al. P]: this paper studies offline prefetching and caching algorithms, where 
fetches are serialized, i.e., at most one fetch can be in progress at any given time. 
An algorithm, called Aggressive (AGG), was shown to yield a min{l + d/k,2)~ 
approximation to the optimal. Karlin and Kimbrel H2| extended this study to 
storage systems which consist of r units (e.g., disks); fetches are serialized on each 
storage unit, thus, up to r block fetches can be processed in parallel. The paper 
gives performance bounds for several offline algorithms in this setting. Algorithm 
AGG is shown to achieve a ratio of (1 + rd/k) to the optimal. Later papers 
ED present experimental results for cooperative prefetching and caching, in the 
presence of optional program-provided hints of future accesses. 

Note that the classic paging problem, where the cost of an access is zero and 
the cost of a fault is 1, is a special case of our problem, in which d ^ lH There 
is a wide literature on the caching (paging) problem. (Comprehensive surveys 
appear in EQUinE].) Borodin et al. |2| introduced the access graph model. The 
paper presents an online algorithm that is strongly competitive on any access 
graph. Later works (e.g., mm) consider extensions of the access graph model, 
or give experimental results for some heuristics for paging in this model [D . 

Karlin et al. HD introduced the Markov paging problem, in which the access 
graph model is combined with the generation of reference sequences by a Markov 
chain. Specifically, the transition from a reference to page u to the reference to 
page V (both represented as nodes in the access graph of the program) is done 
with some fixed probability. The paper presents an algorithm whose fault rate is 
at most a constant factor from the optimal, for any Markov chain. 

There has been some earlier work on the Markovian CLPP, on branch trees 
in which fc > 1 and d = k — 1. We examine here two of the algorithms pro- 
posed in theses works, namely, the algorithms Eager Execution (EE) [ir21)j and 

^ Thus, when normalizing (by factor d) we get that the delivery time equals to one, 
while the access time, 1/d, asymptotically tends to 0. 
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Disjoint Eager Execution (DEE) EE was shown to perform well in practice; 
however, no theoretical performance bounds were derived. Raghavan et al. HZ! 
showed, that for the special case of a homogeneous branch tree, where the tran- 
sition parameter p is close to 1, DEE is optimal to within a constant factor. We 
improve this result, and show (in Section E3) that DEE is nearly optimal on 
branch trees, for any p G [1/2, 1]. 

Due to space limitations, we omit most of the proofs. Detailed proofs can be 
found in p. 



2 Preliminaries 

Let G denote an access graph. Any reference sequence u is a path in G. (When G 
is a directed graph, a forms a directed path in G.) We assume that yf r^+i, for 
all 1 < i < \<j\. Clearly, this makes the problem no easier. Denote by Paths{G) 
the set of paths in G. Let OPT he an optimal offline algorithm for the CLPP. 
We refer to an optimal offline algorithm and the source of requests together as 
the adversary^ who generates the sequence and serves the access requests offline. 

We use competitive analysis (see e.g. P|) to establish performance bounds 
for online algorithms for our problem. The competitive ratio of an online 
algorithm Al on a graph G, for fixed k and d, is given by ^ d(E?) = 
sup£,gpj,(^g(( 3 ) AI(ct)/OPT(ct), where A{a), OPT{a) are the costs incurred by 
A and OPT, respectively, for the execution of a. 

In the Markovian model, each reference sequence, cr, is associated with a 
probability, such that the probability of cr = ri, r 2 , ..., is given by Pr{a) = 
Y\A=^^ where is the transition probability from to r^+i. The 

expected performance ratio of an algorithm. A, in the Markovian model on a 
graph G, for fixed k and d, is ^_^(G) = T,aePaths{G) ' A{a)/OPT{a). 

We abbreviate the formulation of our results, using the following notation: 
Ck,d{G) (cfc®^(G)) is the competitive ratio of an optimal (deterministic) online 
algorithm for the CLPP, on an access graph G, for fixed k and d; Cu(G) (c^®*(G)) 
is the competitive ratio of an optimal (deterministic) online algorithm on G, for 
a cache of size k and arbitrary d. We say that (deterministic) algorithm A is 
strongly competitive, if ^.^^(G) = 0{ck{G) (c _4 ^,_^(G) = 0{c‘^^}{G)). In the 
Markovian model we replace in the above notation competitive ratio with expected 
performance ratio. 

Finally, an access graph G is called a branch tree, if G is an ordered binary 
out-tree in which every internal node has a left child and a right child. Let T be 
a branch tree rooted at r. In the Markovian CLPP, the transition probability 
Pu,v from any node u to its child v is called the loeal probability of v. Denote by 
{uo = r,vi, . . . ,Vn = v} the path from r to some node v, then the accumulated 
probability of v is given by Pa(v) = nr=i 
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3 The Offline CLPP Problem 

In the offline case we are given the reference sequence, and our goal is to achieve 
maximal overlap between prefetching and references to blocks in the cache, so as 
to minimize the overall execution time of the sequence. The next lemma shows 
that a set of rules formulated in U, to characterize the behavior of optimal 
algorithms for the CP problem, applies also for the offline CLPP problem. 

Lemma 1. [No harm rules] There exists an optimal algorithm A, which satisfies 
the following rules : (i) A fetches the next block in the reference sequence that is 
missing in the cache; {ii) A evicts the block whose next reference is furthest in 
the future. (Hi) A never replaces a block B by a block C , if B will be referenced 
before C . 

In the remainder of this section we consider only optimal algorithms that 
follow the “no harm” rules. Clearly, once an algorithm A decides to fetch a 
block, these rules uniquely define the block that should be fetched, and the 
block that will be evicted. Thus, the only decision to be made by any algorithm 
is when to start the next fetch. 

Algorithm AGG, proposed by Cao et al. 0, follows the “no harm” rules; in 
addition, it fetches each block at the earliest opportunity, i.e., whenever there is 
a block in the cache, whose next reference is after the first reference to the block 
that will be fetched. (An example of the execution of AGG is given in Figured) 
As stated in our next result, this algorithm is the best possible for the CLPP. 

Theorem 1. AGG is an optimal offline algorithm for the CLPP problem. 

It can be shown by induction on i, that for i > 1, any optimal offline algorithm 
A which satisfies the “no harm” rules, can be modified to act like AGG in the 
first i steps, without harming ^’s optimality. ■ 

The greediness of AGG plays an important role when d < k. In this case, 
one can show that for any i > 1, when AGG accesses in the cache, each of 
the blocks G+i, . . . , ri^ 4 _i is either in the cache or is being fetched; thus, in any 
reference sequence AGG incurs a single miss: in the first reference. 



4 The Online CLPP Problem 

Let G = (P, E) be an access graph. Suppose that a = {ri, . . . , r;}, 1 < / < fc, is 
a reference sequence to blocks, where only ri is known. 

Recall that ct is a path in G. For a given subgraph G' C G, let PREF„{G') 
denote the maximal set of vertices in G' that form a prefix of a. Let A be 
an online deterministic algorithm. Initially, A knows only that ri S a. In the 
Single Phase CLPP (S-CLPP), A needs to select a subgraph G^ C G of Z 
vertices, such that PREFaiG is maximal. Formally, denote by B{Gj^ = 
minCTgPaj/,s(( 3 ) \PREF^{G the benefit of A from the selection of G^. We 
seek an algorithm A, whose benefit is maximal. 
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Consider algorithm AGGo which mimics the operation of AGG in an online 
fashion. Denote by dist{v, u) the length of the shortest path from r; to rt in G0 
Let INt denote the blocks that are in the cache or are being fetched at time t, 
and OUTt = F \ INt- The following is a pseudocode description for AGGo- 

Algorithm AGGo 
for t = 1) ■ • ■ 5 ^ do 

Let It = arg min{dist(rt,v) : v€ OUTt} . 

Let ui = arg ma,yi{dist{rt,v)v : v€ INt} . 

\f dist{rt,u) < dist{rt,w) 

Evict w from the cache and initiate a fetch for u. 

Lemma 2. Algorithm AGGo Is optimal in the set of deterministic online algo- 
rithms for the S-CLPP problem, for any graph G, and k,d> 1. 

Theorem 2. For any graph G, and k,d>l there exists an algorithm, A, such 

Proof: Consider first Algorithm Lazy-AGGo which operates in phases. Phase i 
starts at some time ti, with a stall of d time units, for fetching a missing block — 
r^. Each phase is partitioned into sub-phases. The first sub-phase of phase i 
starts at time ti^i = ti. At sub-phase j, Lazy-AGGo uses the rules of AGGo for 
selecting a subgraph, Gij, of k' = min{d,k) blocks. Some of these blocks are 
already in the cache: Lazy-AGGo initiates pipelined fetching of the remaining 
blocks. Let be the block that is accessed first in sub-phase j of phase i. Then, 
Cd = , . . . , Ti^.+d-i} is the set of first d block accesses in sub-phase j. Let tij 

be the start time of sub-phase j. Let Good{i,j) = PREF„^{INt^^+d)- We handle 
separately two cases: 

— If \Good{i, j)\ = d, then Lazy-AGGo waits until d blocks were accessed in the 
cache; at time tij -L 2d the j-th sub-phase terminates, and Lazy-AGGo starts 
sub-phase j -I- 1 of phase i. 

— If gj = \Good{i, j)\ < d then at time tij -\-d-\- gj phase i terminates and the 
first missing block in the cache becomes r^+i. 

Consider now Algorithm AGG’o that operates like Lazy-AGGol however, AGG’o 
has the advantage that in each sub-phase, j, all the prefetches are initiated in 
parallel and d time units after this sub-phase starts AGG’o knows the value of 
gj and the first missing block in the cache. If gj = d then AGG’o proceeds to the 
next sub-phase of phase i; if gj < d phase i terminates. Note that combining 
Lemma|21with the parallel fetching property we get that AGG’o outperforms any 
deterministic online algorithm. 

To compute CL,^y-Acco,k,d{G) it suffices to compare the length of phase i of 
Lazy-AGGo and AGG’o, for any i > 1. Suppose that there are sp{i) sub-phases 
in phase i. For Lazy-AGGo each of the first sp{i) — 1 sub-phases incurs 2d time 

When u is unreachable from v dist{v,u) = oo. 
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units, while the last sub-phase incurs d + g time units, for some 1 < g < d. For 
AGG'o each sub-phase (including the last one) incurs d time units; thus, we get 
the ratio {d + g + (sp{i) — l)2d) /{d ■ sp{i)) <2.1 

5 Online CLPP on DAGs and Complete Graphs 

5.1 Directed Acyclic Access Graphs 

When G is a DAG, AGGo acts exactly like algorithm EE, defined as follows. At 
any time t > 1, fetch a missing block that is closest to the currently accessed 
block; discard a block that is unreachable from the current position. From 
Lemma 21 we get that EE is optimal in the set of online deterministic algorithms 
for the S_CLPP on DAGs. Now, consider the operation of EE on a DAG. Our 
next result improves the bound in Theorem [3 in the case where k < d. 

Theorem 3. If G is a DAG then for any cache size k > 1 and delivery time 
d>l, CEE,k,d{G) < min(l -E fc/d, 2) • cff^(G) . 

Proof: As in the proof of Theorem El we use Algorithms Lazy-AGGo and AGG’o. 
When G is a DAG, we call these algorithms Lazy-EE and EE’, respectively. Note 
that from Theorem El we immediately get that CEE,k,d{G) < 2 • cf^jj^{G). This 
follows from the fact that EE performs at least as well as Lazy-EE, since EE 
initiates a block fetch whenever possible, while Lazy-EE waits for a sequence of 
fetches to be completed before it starts fetching the next sequence. 

When d > k we can improve the ratio of 2: in this case each phase consists 
of a single sub-phase. Indeed, each block is accessed exactly once; thus, for any 
* > 1, during the stall for both EE’ and Lazy-EE can access at most k < d 
blocks in the cache (i.e., g\ < d), and phase i terminates. It follows that the 
ratio between the length of phase i for Lazy-EE and EE’ is at most (d -E k)/d. 
This completes the proof. ■ 



Branch 'Trees. The case where G is a branch tree and d = fc — 1 is of particular 
interest, in the application of the GLPP to pipeline execution of programs on 
fast processors (see in jOj). For this case, we can tighten the bound in Theorem 0 

Theorem 4. If G is a branch tree then CEE,k,k-i{G) < (1 -E o(l))cfc,fc-i(G), 
where the o(l) term refers to a function of k. 

5.2 Complete Graphs 

Suppose that G is a complete graph. On these graphs the lower bound obtained 
for deterministic algorithms for the classic paging problem, remains valid for 
the GLPP; that is, if G is a complete graph, then for any cache size fc > 1 
cff*{G) > fc — 1 (see in 0). 

Gonsider the set of marking algorithms proposed for the classical caching 
(paging) problem (see, e.g., 0). A marking algorithm proceeds in phases. At 
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the beginning of a phase all the blocks in the cache are unmarked. Whenever a 
block is requested, it is marked. On a cache fault, the marking algorithm evicts 
an unmarked block from the cache and fetches the requested one. A phase ends 
on the first ‘miss’ in which all the blocks in the cache are marked. At this point 
all the blocks become unmarked, and a new phase begins. 

Lemma 3. For any access graph G, cache size k and delivery time d>l, if A 
is a marking algorithm, then cj^ ^ < fc + 1 . 

The proof is based on showing that if the j-th phase of A consists of Uj references, 
then the cost of A on the phase is at most kd + nj, while the cost of OPT is at 
least max(rij, d). ■ 

We summarize the above discussion in the next result. 

Theorem 5. On a complete graph, any marking algorithm has competitive ratio 
1 + 2/k with respect to the set of deterministic on-line algorithms for the CLPP. 



6 The Markovian CLPP on Branch Trees 

The following algorithm, known as DEE, is a natural greedy algorithm for the 
CLPP in the Markovian model: at any time t > 1, fetch a missing block that 
is most likely to appear in a from the current position; discard a block that is 
unreachable from the current position. 

In this section we analyze the performance of DEE on branch trees, and show 
that its expected performance ratio is < 2. This ratio is reduced to (1 + o(l)) 
for a special class of Markov chains that we call homogeneous (see Section . 



6.1 Performance of DEE on Branch Trees 

Let T be a branch tree, and fc > 1 an integer. Suppose that cr is a reference 
sequence of length k. In the Markovian S-CLPP we need to choose a subtree 
of T of size k, such that the expected size of PREF^iTj^ is maximal. Formally, 
let — J2a£Paths{T) I | be the expected benefit of an on- 

line algorithm A from Tj^. We seek an algorithm. A, whose expected benefit is 
maximal. We proceed to obtain a performance bound for DEE, when applied for 
the CLPP. 

Theorem 6. DEE is optimal to within factor 2 in the set of online algorithms 
on branch trees. 

For showing the theorem, we use the optimality of DEE for the Markovian 
S_CLPP, as shown in HZ|. The proof is based on defining two variants of DEE: 
DEE’ and Lazy-DEE, which operate in phases, similar to the proof of Theorem0 
We omit the details. 
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6.2 Homogeneous Branch Trees 

We call a branch tree T homogeneous, if all the left children in T have the same 
local probability, p G [1/2,1). In other words, the transition probabilities from 
any vertex u to its its left child is p and to its right child is 1 — p. 

In the following we derive an explicit asymptotic expression for the expected 
benefit of DEE, when solving the Markovian S_CLPP on a homogeneous branch 
tree, with any parameter 1/2 < p < 1. Our computations are based on a 
Fibonacci-type analysis, which well suits the homogeneous case. For any integer 
q > 2, the n-th number of the q-distance Fibonacci sequence is given by 

fO ifn<g— 1 

g{n) = <1 iin = q-l 

y g(n — 1) -I- g{n — q) otherwise. 

Note that in the special case where q = 2, we get the well known Fibonacci 
numbers dE 

Lemma 4. For any n > 1 and a given q > 2, g{n) = 0((r"“'J)/q), where 
1 < r = r(q) < 2. 

Proof. To find an expression for g{n), n > q, we need to solve the equation 
generated from the recursion formula, i.e., 

a;" = (1) 

In other words, we need to find the roots of the polynomial p{x) = x'^ — x‘^~^ — 1. 
Note that p{x) has no multiple roots. Hence, the general form of our sequence 
is g{n) = where ri,r 2 , ■■■,Tq are the roots of the polynomial p{x). 

Also, for 1 < a; < 2 p{x) is monotone and non-decreasing. As p(l) = —1, and 
p(2) = 2'J — 2“?“^ — 1 = 2“?“^ — 1 > 0, we get that p{x) has a single root r, G 
in the interval [1,2]. 

Denote by |a;| the module of a complex number x. We claim that [r^l < 
for all i < q. The claim trivially holds for any satisfying \vi\ < 1, thus we may 
assume that |r, I > 1. If is a root of p(a:), then 0 = p(rj) = jrj'^ — — 1| > 

— \ri\^ ^ — 1 = p(|ri|), and since p{x) is non-decreasing for x > 1, we 
conclude that \vi\ < rq. In fact, it can be shown that the last inequality is 
strong, i.e., |ri| < Vq, for any 1 < * < q — 1. We can write 

g{n) = hn- = hqTq-{l + (2) 

and since \ri/rq\ < 1, the sum in the rhs of 0 exponentially tends to zero, i.e., 
lim„_>oo I]i=i 0 Hence, we get that 

9in) = bqrq'^{l + o{l)). (3) 

This sequence can be viewed as a special case of the q-order Fibonacci sequence, in 
which g{n) = g(n - j) (see, e.g., |14I19|'I. 
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Now, bq can be calculated by solving a linear system for the first q elements 
of the sequence g(n). 



/I 1 


..1 \ 








(^\ 


?’l T2 


■■Tq 




&2 


= 


0 












v) 



The determinant of the above matrix is known as Vandermonde determinant uni 
The general solution of such a system is -rc.) ^ Our 

polynomial is p{x) = nLi(^ “ P'{ri) = therefore 

we get that bi = We can calculate all the coefficients. In particular, bq = 

l/{qrq’^~^ — {q — l)rq^~^). Substituting into (0 we get the statement of the 
lemma. ■ 



Lemma 5. Let | < p < 1 satisfy that for some natural q 

p'^ = 1 — p and q € N, 



( 4 ) 



and let a = (1 — p)q + p, S € {0,1}. Then the height of the subtree chosen by 
DEE for the Markovian S-CLPP is given by height{Tr)EE) = log]^/p((l —p)ak + 
p) + <5 + o(l). The o(l) term refers to a function of k. 

Proof sketch: Let q be defined as in 0, and denote by f{n) the number 

of vertices in T with accumulated probability p". Then, f{n) can be computed 
recursively as follows. For 1 < n < g, since p" > 1 — p, there is a single node with 
accumulated probability p”; for n > q, we get a node with accumulated proba- 
bility p", either by taking the left child of a node with accumulated probability 
p”“^, or by taking the right child of a node, whose accumulated probability is 
pn-q^ Hence, /(n) = g{n -I- g — 1) , where g{n) is the nth g-distance Fibonacci 
number. From 0, /(n) = bqrq^^‘^~^{l + o(l)), where is the single root of 
equation 0 in the interval [1,2]. Using 0, it is easy to verify that Vq = 1/p 
satisfies 0 . Hence, we get that /(n) = (1 -I- o(l))ap Let h be the maximal 
integer satisfying /(^) — Then, 



k > 



1 + 0 ( 1 ) 
a{l-p) 




( 5 ) 



Also, there exists a vertex with accumulated probability which is not in 
Tdee- Thus, 



k < 



h+l 



n=0 



1 + 0 ( 1 ) 

a(l -p) 




( 6 ) 



Combining © and © we get the statement of the lemma. ■ 

Lemma 0 yields an asymptotic expression for the expected benefit of DEE. 
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Corollary 1. For any | < _p < 1, the expected benefit of DEE in solving the 
Markovian S-CLPP problem is 

1 + Igi ((1 — p)ak + p) 

B{Tdee) = ^ (1 + 0(1)), (7) 

a 

where the o(l) term refers to a function of k. 

Corollary Q is essential for proving the next theorem (see |2|). 

Theorem 7. DEE is within a factor 1 + o(l) from the optimal in the set of 
algorithms for the Markovian CLPP on homogeneous branch trees, for any p G 
[ 1 / 2 , 1 ]. 

Acknowledgments. The authors thank Anna Karlin, Rajeev Motwani and 
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Abstract. Parallel disks promise to be a cost effective means for achiev- 
ing high bandwidth in applications involving massive data sets, but al- 
gorithms for parallel disks can be difficult to devise. To combat this 
problem, we dehne a useful and natural duality between writing to par- 
allel disks and the seemingly more difficult problem of prefetching. We 
hrst explore this duality for applications involving read-once accesses 
using parallel disks. We get a simple linear time algorithm for comput- 
ing optimal prefetch schedules and analyze the efficiency of the resulting 
schedules for randomly placed data and for arbitrary interleaved accesses 
to striped sequences. Duality also provides an optimal schedule for the 
integrated caching and prefetching problem, in which blocks can be ac- 
cessed multiple times. Another application of this duality gives us the 
hrst parallel disk sorting algorithms that are provably optimal up to lower 
order terms. One of these algorithms is a simple and practical variant of 
multiway merge sort, addressing a question that has been open for some 
time. 



1 Introduction 

External memory (EM) algorithms are designed to be efficient when the problem 
data do not fit into the high-speed random access memory (RAM) of a computer 
and therefore must reside on external devices such as disk drives [17]. In order to 
cope with the high latency of accessing data on such devices, efficient EM algo- 
rithms exploit locality in their design. They access a large block of B contiguous 
data elements at a time and perform the necessary algorithmic operations on 
the elements in the block while in the high-speed memory. The speedup can 
be significant. However, even with blocked access, a single disk provides much 
less bandwidth than the internal memory. This problem can be mitigated by 
using multiple disks in parallel. For each input/output operation, one block is 
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transferred between memory and each of the D disks. The algorithm therefore 
transfers D blocks at the cost of a single-disk access delay. 

A simple approach to algorithm design for parallel disks is to employ large 
logical blocks, or superblocks of size B ■ D in the algorithm. A superblock is split 
into D physical blocks — one on each disk. We refer to this as superblock striping. 
Unfortunately, this approach is suboptimal for EM algorithms like sorting that 
deal with many blocks at the same time. An optimal algorithm for sorting and 
many related EM problems requires independent access to the D disks, in which 
each of the D blocks in a parallel I/O operation can reside at a different position 
on its disk [19,17]. Designing algorithms for independent parallel disks has been 
surprisingly difficult [19,14,15,3,8,9,17,16,18]. 

In this paper we consider parallel disk output and input separately, in par- 
ticular as the output scheduling problem problem and the prefetch scheduling 
problem respectively. The (online) output scheduling (or queued writing) prob- 
lem takes as input a fixed size pool of m (initially empty) memory buffers for 
storing blocks, and the sequence {wo,wi, , wl-i) of block write requests as 
they are issued. Each write request is labeled with the disk it will use. The re- 
sulting schedule specifies when the blocks are output. The buffer pool can be 
used to reorder the outputs with respect to the logical writing order given by S 
so that the total number of output steps is minimized. 

The (offline) prefetch scheduling problem takes as input a fixed size pool of m 
(empty) memory buffers for storing blocks, and the sequence (ro,ri, . . . , rr-i) 
of distinct block read requests that will be issued. Each read request is labeled 
with the disk it will use. The resulting prefetch schedule specifies when the blocks 
should be fetched so that they can be consumed by the application in the right 
order. 

The central theme in this paper is the newly discovered duality between these 
two problems. Roughly speaking, an output schedule corresponds to a prefetch 
schedule with reversed time axis and vice versa. We illustrate how computa- 
tions in one domain can be analyzed via duality with computations in the other 
domain. 

Sect. 2 introduces the duality principle formally for the case of distinct blocks 
to be written or read {write-once and read-once scheduling). Then Sect. 3 de- 
rives an optimal write-once output scheduling algorithm and applies the duality 
principle to obtain an optimal read-once prefetch scheduling algorithm. 

Even an optimal schedule might use parallel disks very inefficiently because 
for difficult inputs most disks might be idle most of the time. In Sect. 4 we there- 
fore give performance guarantees for randomly placed data and for arbitrarily 
interleaved accesses to a number of data streams. In particular, we discuss the 
following allocation strategies: 

Fully Randomized (FR): Each block is allocated to a random disk. 
Striping (S): Consecutive blocks of a stream are allocated to consecutive disks 
in a simple, round-robin manner. 

Simple Randomized (SR): Striping where the disk selected for the first block 
of each stream is chosen randomly. 
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Randomized Cycling (RC): Each stream i chooses a random permutation tt^ 
of disk numbers and allocates the j-th block of stream i on disk TTi{j mod D). 

In Sect. 5 we relax the restriction that blocks are accessed only once and 
allow repetitions {write-many and read-many scheduling). Again we derive a 
simple optimal algorithm for the writing case and obtain an optimal algorithm 
for the reading case using the duality principle. A similar result has recently been 
obtained by Kallahalla and Varman [11] using more complicated arguments. 

Finally, in Sect. 6 we apply the results from Sects. 3 and 4 to parallel disk 
sorting. Results on online writing translate into improved sorting algorithms us- 
ing the distribution paradigm. Results on offline reading translate into improved 
sorting algorithms based on multi-way merging. By appending a ‘D’ for distri- 
bution sort or an ‘M’ for mergesort to an allocation strategy (FR, S, SR, RC) we 
obtain a descriptor for a sorting algorithm (FRD, FRM, SD, SM, SRD, SRM, 
RCD, RCM). This notation is an extension of the notation used in [18]. RCD 
and RCM turn out to be particularly efficient. Let 

N f N 

Sort(fV) = — (^1 + logM- 

and note that 2 • Sort(fV) appears to be the lower bound for sorting N elements 
on D disks [1]. Our versions of RCD and RCM are the first algorithms that 
provably match this bound up to a lower order term 0{BD/M)Sort{N). The 
good performance of RCM is particularly interesting. The question of whether 
there is a simple variant of mergesort that is asymptotically optimal has been 
open for some time. 



Related Work 

Prefetching and caching has been intensively studied and can be a quite diffi- 
cult problem. Belady [5] solves the caching problem for a single disk using our 
machine model. Cao et al. [7] propose a model that additionally allows overlap- 
ping of I/O and computation. Albers et al. [2] were the first to find an optimal 
polynomial time offline algorithm for the single-disk case in this model but it 
does not generalize well to multiple disks. Kimbrel and Karlin [12] devised a 
simple algorithm called reverse aggressive that obtains good approximations in 
the parallel disk case if the buffer pool is large and the failure penalty F is small. 
However, in our model, which corresponds to F — >■ oo, the approximation ratio 
that they show goes to infinity. Reverse aggressive is very similar to our algo- 
rithm so that it is quite astonishing that the algorithm is optimal in our model. 
Kallahalla and Varman [10] studied online prefetching of read-once sequences for 
our model. They showed that very large lookahead L 3> niD is needed to obtain 
good competitiveness against an optimal offline algorithm. They proposed an 
0{L^D^ time algorithm with this property, and yielding optimal schedules for 
the offline case. A practical disadvantage of this algorithm is that some blocks 
may be fetched and discarded several times before they can be delivered to the 
application. 
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There is less work on performance guarantees. A (slightly) suboptimal writing 
algorithm is analyzed in [16] for FR allocation and extended to RC-allocation 
in [18]. These results are the basis for our results in Sect. 4. For reading there is 
an algorithm for SR allocation that is close to optimal if m H log D [3] . 

There are asymptotically optimal deterministic algorithms for external sort- 
ing [15], but the constant factors involved make them unattractive in practice. 
Barve et al. [3] introduced a simple and efficient randomized sorting algorithm 
called Simple Randomized Mergesort (SRM). For each run, SRM allocates blocks 
to disks using the SR allocation discipline. SRM comes within 7 • Sort(A^) of the 
apparent lower bound if M/B = 1? (T>log(H)/ 7 ^) but for M = o{D\ogD) the 
bound proven is not asymptotically optimal. It was an open problem whether 
SRM or another variant of striped mergesort could be asymptotically optimal 
for small internal memory. Knuth [13, Exercise 5.4.9-31] gives the question of a 
tight analysis of SR a difficulty of 48 on a scale between 0 and 50. 

To overcome the apparent difficulty of analyzing SR, Vitter and Hutchin- 
son [18] analyzed RC allocation, which provides more randomness but retains 
the advantages of striping. RCD is an asymptotically optimal distribution sort 
algorithm for multiple disks that allocates successive blocks of a bucket to the 
disks according to the RC discipline and adapts the approach and analysis of 
Sanders, Egner, and Korst [16] for write scheduling of blocks. However, the ques- 
tion remained whether such a result can be obtained for mergesort and how close 
one can come to the lower bound for small internal memory. 

2 The Duality Principle 

Duality is a quite simple concept once the model is properly defined. Therefore, 
we start with a more formal description of the model: 

Our machine model is the parallel disk model of Vitter and Shriver [19] 
with a single^ processor, D disks and an internal memory of size M. All blocks 
have the same size B. In one I/O step, one block on each disk can be accessed 
in a synchronized fashion. We consider either a queued writing or a buffered 
prefetching arrangement, where a pool of m block buffers is available to the 
algorithm (see Fig. 1). 

A write-once output scheduling problem is defined by a sequence B = 
{bo , . ■ . , br-i) of distinct blocks. Let disk( 6 i) denote the disk on which block bi is 
located. An application process writes these blocks in the order specified by B. 
We use the term write for the logical process of moving a block from the respon- 
sibility of the application to the responsibility of the scheduling algorithm. The 
scheduling algorithm orchestrates the physical output of these blocks to disks. 
An output schedule is specified by giving a function oStep : {bo, ■ ■ ■ , &l-i} — >■ N 
that specifies for each disk block bi G B the time step when it will be output. An 
output schedule is correct if the following conditions hold: (i) No disk is refer- 
enced more than once in a single time step, i.e., ii i / j and disk(&i) = disk(&j) 

^ A generalization our results to multiple processors is relatively easy as long as data 
exchange between processors is much faster than disk access. 
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Fig. 1. Duality between the prefetching priority and the output step. The hashed blocks 
illustrate how the blocks of disk 2 might be distributed. 



then oStep(&i) ^ oStep{bj). (ii) The buffer pool is large enough to hold all the 
blocks bj that are written before a block bi but not output before bi, i.e., 

VO < i < L : oSacklog(5i) := \{j < i : oStep(6j) > oStep(6i)}| < m . 

The number of steps needed by an output schedule is T = maxo<i<L oStep(6i). 
A schedule is optimal if it minimizes T among all correct schedules. 

It will turn out that our write-once output scheduling algorithms even work 
if they are given the blocks online, i.e., one at a time without specifying S 
explicitly. 

A read-once prefetch scheduling problem is defined analogously. Now read- 
ing means the logical process of moving a block from the responsibility of the 
scheduling algorithm to the application and fetching means the physical disk ac- 
cess. A prefetch schedule is defined using a function iStep : {5o, . . . , br-i} — >■ N. 
The limited buffer pool size requires the correctness condition 

\/0 < i < L : ii3acklog(6i) := |{j > i : iStep(6j) < iStep(6i)}| < m 

(all blocks bj that are fetched no later than a block bi but consumed after bi 
must be buffered). 

It will turn out that our prefetch scheduling algorithms work offline, i.e., 
they need to know S in advance. 
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The following theorem shows that the reading and writing not only have 
similar models but are equivalent to each other in a quite interesting sense: 

Theorem 1. (Duality Principle) Consider any sequence S = {bo, . . . , 6l_i) 
of distinct write requests. Let oStep denote a correct output schedule for S that 
uses T output steps. Then we get a correct prefetch schedule iStep for = 
... , &o) that uses T fetch steps by setting iStep(6i) = T — oStep(6i) + 1. 
Vice versa, every correct prefetch schedule iStep for that uses T fetch 
steps yields a correct output schedule oStep(6i) = T — iStep(6i) + 1 for E, using 
T output steps. 

Proof. For the first part, consider iStep(6i) = T — oStep(6i) + 1. The resulting 
fetch steps are between 1 and T and all blocks on the same disk get different 
fetch steps. It remains to show that i,Backlog(6i) < m for 0 < i < L. With 
respect to E^, we have 

i,Backlog(6i) = \{j < i : iStep(&j) < iStep(6i)}| 

= |{j < z : r — oStep(&j) + 1 < T — oStep(5i) + 1}| 

= \{j < i : oStep(6j) > oStep(6i)}| . 

the latter value is o,Backlog(&i) with respect to E and hence smaller than m. 
The proof for the converse case is completely analogous. □ 

3 Optimal Write-Once and Read-Once Scheduling 

We give an optimal algorithm for writing a write-once sequence, prove its op- 
timality and then apply the duality principle to transform it into a read-once 
prefetching algorithm. 

Consider the following algorithm greedy Writing for writing a sequence E = 
{bo,... ,6l-i) of distinct blocks. Let Q denote the set of blocks in the buffer 
pool, so initially Q = 0. Let Qd = {b € Q : disk(6) = d}. Write the blocks bi in 
sequence as follows: (1) Lf \Q\ < m then simply insert bi into Q. (2) Otherwise, 
each disk with Qd ^ 0 outputs the block in Qd that appears first in E. The blocks 
output are then removed from Q and bi is inserted into Q. (3) Once all blocks 
are written the queues are flushed, i.e., additional output steps are performed 
until Q is empty. 

Any schedule where blocks are output in arrival order on each disk, is called a 
FLFO schedule. The following lemma tells us that it is sufficient to consider FIFO 
schedules when we look for optimal schedules. The proof is based on transforming 
a non-FIFO schedule into a FIFO schedule by exchanging blocks in the schedule 
of a disk that are output out of order. 

Lemma 1. For any sequence of blocks E and every correct output schedule 
oStep^ there is a FLFO output schedule oStep consisting of at most the same 
number of output steps. 
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Algorithm greedyWriting is one way to compute a FIFO schedule. The fol- 
lowing lemma shows that greedyWriting outputs every block as early as possible. 

Lemma 2. For any sequence of blocks S and any FIFO output schedule oStep^ 
let oStep denote the schedule produced by algorithm greedyWriting. Then for all 
bi € S, we have oStep(5i) < oStep^(6i). 

Proof. (Outline) The proof is by induction over the number of blocks. There are 
two nontrivial cases. One case corresponds to the situation where the output 
step of a block immediately follows an output of a previous block on the same 
disk. The other case corresponds to the situation where no earlier step is possible 
because otherwise its o,Backlog would be too large. 

Combining Lemmas 1 and 2 we see that greedyWriting gives us optimal 
schedules for write-once sequences: 

Theorem 2. Algorithm greedyWriting gives a correct, minimum length output 
schedule for any write-once reference sequence E. 

Combining the duality principle and the optimality of greedyWriting, we get 
an optimal algorithm for read-once prefetching that we call lazy prefetching: 

Corollary 1. An optimal prefetch schedule iStep for a sequence S can be ob- 
tained by using greedyWriting to get an output schedule oStep for and setting 
iStep(6i) = T — oStep(6i) -I- 1. 

Note that the schedule can be computed in time 0{L -\- D) using very simple 
data structures. 

4 How Good Is Optimal? 

When we are processing several streams concurrently, the knowledge that we 
have an optimal prefetching algorithm is often of little help. We also want to 
know “how good is optimal?” In the worst case, all requests may go to the same 
disk and no prefetching algorithm can cure the dreadful performance caused by 
this bottleneck. However, the situation is different if blocks are allocated to disks 
using striping, randomization? or both. 

Theorem 3. Consider a sequence of L block requests, and a buffer pool of size 
m > D blocks. The number of I/O steps needed by greedyWriting or lazy prefetch- 
ing is given by the following bounds, depending on the allocation discipline. For 
striping and randomized cycling, an arbitrary interleaving of sequential accesses 
to S sequences is allowed. 

Striping: -\- S, if m > S{D — 1); 

Fully Random (FR): (l + 0(^)) -I- log m) (expected); 

Randomized Cycling (RC): (l -\- -\- min [S -\- j^, log m} (expected) 

^ In practice, this will be done using simple hash functions. However, for the analysis 
we assume that we have a perfect source of randomness. 
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Proof. (Outline) The bound for striped writing is based on the observation that 
Lf D + S' is the maximum number of blocks to be handled by any disk and that 
the OyBacklog of any block can never exceed m ii m > S{D — 1) . 

For fully random placement the key idea is that greedyWriting dominates 
the “throttled” algorithm of [16], which admits only {1 — 0{D/M))D blocks per 
output step into the queues. 

The bound for RC is a combination of the two previous bounds. The bound 
for FR applies to RC writing using the observation of [18] that the throttled 
algorithm of [16] performs at least as well for RC as for FR. 

The results for writing transfer to offline prefetching via duality. For the RC 
case we also need the observation that the reverse of a sequence using RC is 
indistinguishable from a nonreversed sequence. □ 

For writing, the trailing additive term for each case enumerated in Theorem 3 
can be dropped if the final contents of the buffer pool is not flushed. 



5 Integrated Caching and Prefetching 

We now relax the condition that the read requests in S are for distinct blocks, 
permitting the possibility of saving disk accesses by keeping previously accessed 
blocks in memory. For this read-many problem, we get a tradeoff for the use of 
the buffer pool because it has to serve the double purposes of keeping blocks 
that are accessed multiple times, and decoupling physical and logical accesses to 
equalize transient load imbalance of the disks. We define the write-many problem 
in such a way that the duality principle from Theorem 1 transfers: The latest 
instance of each block must be kept either on its assigned disk, or in the buffer 
pool. The final instance of each block must be written to its assigned disk.^ 

We prove that the following offline algorithm many Writing minimizes the 
number of output operations for the write-many problem: Let Q and Qd be 
defined as for greedyWriting. To write block bi, if bi € Q, the old version is 
overwritten in its existing buffer. Otherwise, if jQ| < rn, bi is inserted into Q. 
If this also fails, an output step is performed before bi is inserted into Q. The 
output analogue of Belady’s min rule [5] is used on each disk, i.e., each disk with 
Qd yf 0 outputs the block in Qd that is accessed again farthest in the future. 

Applying duality, we also get an optimal algorithm for integrated prefetching 
and caching of a sequence S: using the same construction as in Cor. 1 we get 
an optimal prefetching and caching schedule. It remains to prove the following 
theorem: 

Theorem 4. Algorithm many Writing solves the write-many problem with the 
fewest number of output steps. 

® The requirement that the latest versions have to be kept might seem odd in an offline 
setting. However, this makes sense if there is a possibility that there are reads at 
unknown times that need an up-to-date version of a block. 
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Proof. (Outline) We generalize the optimality proof of Belady’s algorithm by 
Borodin and El-Yaniv [6] to the case of writing and multiple disks. Let S = 
{bo, ■■ ■ , &L-i) be any sequence of blocks to be written. The proof is based on 
the following claim. 

Claim: Let ALG be any algorithm for cached writing. Let d denote a fixed 
disk. For any 0 < f < L it is possible to construct an offline algorithm ALG^ 
that satisfies the following properties: (i) ALGi processes the first i — 1 block 
write requests exactly as ALG does, (ii) If block bi is written immediately after 
output step s, then immediately before step s we had bi ^ Q and |Q| = m. (iii) 
If bi is written immediately after output step s, then ALG^ performs this output 
according to the min rule on disk d. (iv) ALGj takes no more steps than ALG. 

Once this claim is established, we can transform an optimal scheduling algo- 
rithm OPT into an algorithm S that uses the MIN rule by iteratively applying 
the claim for each disk and each 0 < i < L without increasing the number of 
output steps used. 

To prove the claim we modify any algorithm ALG to get an algorithm ALG^ 
that fulfills all the properties. If property (ii) is violated, it suffices to write bi 
earlier. If property (iii) is violated, the output step s preceding the write of bi 
is modified on disk d to follow the min rule. Suppose, ALG^ outputs block b in 
step s. It remains to explain how ALG^ can mimic ALG in the subesequent steps 
despite this difference in step s. A problem can only arise if ALG later overwrites 
the current version of b. ALGi exploits the fact that ALG either outputs nothing 
or something that is accessed again before the next access to b. Either way, ALG^ 
can arrange to have an unused buffer block available when ALG overwrites the 
current version of b. □ 



6 Application to Sorting 

Optimal algorithms for read-once prefetching or write-once output scheduling 
can be used to analyze or improve a number of interesting parallel disk sorting 
algorithms. We start by discussing multiway mergesort using randomized cycling 
allocation (ROM) in some detail and then survey a number of additional results. 

Multiway mergesort is a frequently used external sorting algorithm. We de- 
scribe a variant that is similar to the SRM algorithm in [3]. Originally the N 
input elements are stored as a single data stream using any kind of striping. 
During run formation the input is read in chunks of size M, that are sorted in- 
ternally and then written out in runs allocated using RC allocation. Neglecting 
trivial rounding issues, run formation is easy to do using 2N/{DB) I/O steps. 
By investing another 0(N/{DB‘^)) I/O steps we can keep triggers, the largest 
keys of each block, in a separate file. Then we set aside a buffer pool of size 
m = cD for some constant c and perform riog^/3_g)(£i) merge phases. In a 
merge phase, groups of k = ^ — 0{D) runs are merged into new sorted runs, 
i.e., after the last merge phase, only one sorted run is left. Merging k runs of size 
sB can be performed using s block reads by keeping one block of each run in the 
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internal memory of the sorting application. The order of these reads for an en- 
tire phase can be exactly predicted using the trigger information and 0(^N/B‘^) 
I/Os for merging trigger files [3]. Hence, if we use optimal prefetching, Theo- 
rem 3 gives an upper bound of (1 -I- 0(l/c))-^ -!-••• for the number of fetch 
steps of a phase. The number of output steps for a phase is N/{BD) if we have 
an additional output buffer of D blocks. The final result is written using any 
striped allocation strategy, i.e., the application calling the sorting routing need 
not be able to handle RC allocation. We can write the resulting total number of 

I/O steps as V) where 



SorC/(fV) = |gTa.^ 



1 



+ / + 0 



f— 

\DB 



Table 1 compares a selection of sorting algorithms using this generalized form 
of the I/O bound for parallel disk sorting. (In the full paper we present additional 
results for example for FR allocation.) The term represents the reading and 
writing of the input and the final output respectively. The factor a is a constant 
that dominates the I/O complexity for large inputs. Note that for a = 2 and 
f = m = 0 this expression is the apparent lower bound for sorting. The additive 
offset / may dominate for small inputs. The reduction of the memory by m 
blocks in the base of the logarithm is due to memory that is used for output or 
prefetching buffer pools outside the merging or distribution routines, and hence 
reduces the number of data streams that can be handled concurrently. One way 
to interpret m is to view it as the amount of additional memory needed to match 
the performance of the algorithm on the multihead I/O model [I] (where load 
balancing disk accesses is not an issue 

Even without any randomization. Theorem 3 shows that mergesort with de- 
terministic striping and optimal prefetching (SM) is at least as efficient as the 
common practice of using superblock striping. However, both algorithms achieve 
good performance only if a lot of internal memory is available. 

Using previous work on distribution sort and the duality between prefetching 
and writing, all results obtained for mergesort can be extended to distribution 
sort (e.g., SD, SRD, FRD, RCD-I-). There are several sorting algorithms based 
on the distribution principle, e.g. radix sort. The bounds given here are based 
on a generalization of quicksort where k — 1 splitter elements are chosen to split 
an unsorted input stream into k approximately equal sized output streams with 
disjoint ranges of keys. After W~\ splitting phases, the remaining 

streams can be sorted using internal sorting. 

A simple variant of distribution sort with randomized cycling (RCD) was 
already analyzed in [18]. The new variant, RCD-I-, has some practical improve- 
ments (fewer tuning parameters, simpler application interface) and, it turns out 

^ If we assume a fixed memory size we cannot discriminate between some of the al- 
gorithms using the abstract I/O model. One algorithm may have a smaller factor a 
yet need an extra distribution or merging phase for some input sizes N. In practice, 
one could use a smaller block size for these input sizes. The abstract I/O model does 
not tell us how this affects the total I/O time needed. 
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Table 1. Summary of Main Results for I/O Complexity of Parallel Disk Sorting Algo- 
rithms. Algorithms with boldface names are asymptotically optimal: M/D = Merge 
/Distribution sort. SM/SD = merge / distribution sort with any striping (S) alloca- 
tion. SRM and SRD use Simple Randomized striping (SR). RCD, RCD-I- and RCM 
use Randomized Cycling (RC) allocation. 
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that the additive term / can also be eliminated. Using a careful formulation 
of the algorithmic details it is never necessary to flush the write buffers. All in 
all, RCD-h is currently the parallel disk sorting algorithm with the best I/O 
performance bounds known. 
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Abstract. We introdnce a new problem that was motivated by a (more 
complicated) problem arising in a robotized assembly environment. The 
bin coloring problem is to pack nnit size colored items into bins, such 
that the maximum number of different colors per bin is minimized. Each 
bin has size B £ N. The packing process is subject to the constraint that 
at any moment in time at most g £ N bins are partially filled. Moreover, 
bins may only be closed if they are filled completely. An online algorithm 
must pack each item without knowledge of any future items. 

We investigate the existence of competitive online algorithms for the bin 
coloring problem. We prove an upper bound of 3g — 1 and a lower bound 
of 2q for the competitive ratio of a natural greedy-type algorithm, and 
show that surprisingly a trivial algorithm which uses only one open bin 
has a strictly better competitive ratio of 2q — 1. Moreover, we show that 
any deterministic algorithm has a competitive ratio 0{q) and that ran- 
domization does not improve this lower bound even when the adversary 
is oblivious. 



1 Introduction 

One of the commissioning departments in the distribution center of Berlitz PBS 
AG, Falkensee, one of the main distributors of office supply in Europe, is devoted 
to greeting cards. The cards are stored in parallel shelving systems. Order pick- 
ers on automated guided vehicles collect the orders from the storage systems, 
following a circular course through the shelves. At the loading zone, which can 
hold q vehicles, each vehicle is logically “loaded” with B orders which arrive 
online. The goal is to avoid congestion among the vehicles (see P for details). 
Since the vehicles are unable to pass each other and the “speed” of a vehicle 

* Research supported by the German Science Foundation (DEG, grant GR 883/9-10) 

** Supported by the TMR Network DONET of the European Gommunity ERB TMRX- 
CT98-0202 

F. Meyer auf der Heide (Ed.): ESA 2001, LNCS 2161, pp. 74-^^ 2001. 

© Springer- Verlag Berlin Heidelberg 2001 



Online Bin Coloring 



75 



is correlated to the number of different stops it must make, this motivates to 
assign the orders in such a way that the vehicles stop as few times as possible. 

The above situation motivated the following hin coloring problem: One re- 
ceives a sequence of unit size items ri, ... ,rm where each item has a color ri S N, 
and is asked to pack them into bins with size B. The goal is to pack the items 
into the bins “most uniformly”, that is, to minimize the maximum number of 
different colors assigned to a bin. The packing process is subject to the constraint 
that at any moment in time at most g S N bins may be partially filled. Bins 
may only be closed if they are filled completely. (Notice that without these strict 
bounded space constraints the problem is trivial since in this case each item can 
be packed into a separate bin) . In the online version of the problem, denoted by 
OlBcp, each item must be packed without knowledge of any future items. An 
online algorithm is c-competitive, if for all possible request sequences the maxi- 
mum colors in the bins packed by the algorithm and the optimal offline solution 
is bounded by c. The OlBcp can be viewed as a variant of the bounded space 
binpacking problem in (see m for recent surveys on binpacking problems). 

Our investigations of the OlBcp reveal a curiosity of competitive analysis: a 
truly stupid algorithm achieves essentially a (non-trivial) best possible competi- 
tive ratio whereas a seemingly reasonable algorithm performs provably worse in 
terms of competitive analysis. 

We first analyze a natural greedy-type strategy, and show that this strategy 
has a competitive ratio no greater than 3g but no smaller than 2g, where q is the 
maximum number of open bins. We show that a trivial strategy that only uses 
one open bin, has a strictly better competitive ratio of 2g — 1. Then we show 
that surprisingly no deterministic algorithm can be substantially better than the 
trivial strategy. More specifically, we prove that no deterministic algorithm can, 
in general, have a competitive ratio less than q. Even more surprising, the lower 
bound of q for the competitive ratio continues to hold for randomized algorithms 
against an oblivious adversary. Finally, not even “resource augmentation” , which 
means that the online algorithm is allowed to use a fixed number q' > q oi open 
bins can help to overcome the lower bound of 12(g) on the competitive ratio. 

2 Problem Definition 

Definition 2.1 (Online Bin Coloring Problem). In the Online Bin Coloring 
Problem (OlBcPs,^) with parameters B,q € N (B,q > 2), one is given a 
sequence a = ri, . . . ,rm of unit size items (requests), each with a color r^ G N, 
and is asked to pack them into bins with size B, that is, each bin can accommodate 
exactly B items. The packing is subject to the following constraints: 

1. The items must be packed according to the order of their appearance, that is, 
item i must be packed before item k for all i < k. 

2. At most q partially filled bins may be open to further items at any point in 
the packing process. 

3. A bin may only be closed if it is filled completely, i.e., if it has been assigned 
exactly B items. 
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The objective is to minimize the maximum number of different colors assigned to 
a bin. An online algorithm for OlBcPb^^ must pack each item ri (irrevocably) 
without knowledge of requests r^ with k > i. 

In the sequel it will be occasionally helpful to use the following view on the 
bins used by an arbitrary algorithm ALG to process an input sequence a. Each 
open bin has an index x, where 1 < x < q. Each time a bin with index x is closed 
and a new bin is opened the new bin will also have index x. If no confusion can 
occur, we will refer to a bin with index x as bin x. 

We denote by ALG(ct) the objective function value of the solution produced 
by algorithm ALG on input a. OPT denotes an optimal offline algorithm which 
has complete knowledge about the input sequence a in advance. However, the 
packing must still obey the constraints Q to 0 specified in Definition IQ 

Definition 2.2. A deterministic online algorithm ALG for OLBGPs^g is called 
c-competitive, if there exists a constant c such that ALG(ct) < c • OPt((t) holds 
for any request sequence a. 

The competitive ratio of an algorithm ALG is the smallest number c such that 
ALG is c-competitive. The size of the bins B is a trivial upper bound on the 
competitive ratio of any algorithm for OLBGPs^g. 

A randomized online algorithm is a probability distribution over a set of 
deterministic online algorithms. The objective value produced by a randomized 
algorithm is therefore a random variable. In this paper we analyze the perfor- 
mance of randomized online algorithms only against an oblivious adversary. An 
oblivious adversary does not see the realizations of the random choices made by 
the online algorithm and therefore has to generate a request sequence in advance. 
We refer to jSj for details on the various adversary models. 

Definition 2.3. A randomized online algorithm ralg is c-competitive against 
an oblivious adversary i/E [ralg(ct)] < c • OPT(tj) for any request sequence a. 

3 The Algorithm greedyfit 

In this section we introduce a natural greedy-type strategy, which we call 
GREEDYFIT, and show that the competitive ratio of this strategy is at most 3q 
but no smaller than 2q (provided the capacity B is sufficiently large). 

GREEDYFIT: If upon the arrival of request Cj the color is already contained in 
one of the currently open bins, say bin 6, then put into bin b. Otherwise put 
item Vi into a bin that contains the least number of different colors (which 
means opening a new bin if currently less than q bins are non-empty). 

The analysis of the competitive ratio of greedyfit is essentially via a pigeon- 
hole principle argument. We first show a lower bound on the number of bins that 
any algorithm can use to distribute a the items in a contiguous subsequence and 
then relate this number to the number of colors in the input sequence. 
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Lemma 3.1. Let a = be any request sequence and let u' = 

Vi, . . . be any contiguous subsequence of a. Then any algorithm packs the 

items of a' into at most2q+ [{£ — 2q)/B\ different bins. 

Proof. Let ALG be any algorithm and let 6i , . . . , 5* be the set of open bins for ALG 
just prior to the arrival of the first item of a' . Denote by f{bj) G {1, . . . , B — 1} 
the empty space in bin bj at that moment in time. To close an open bin bj, ALG 
needs f{bj) items. Opening and closing an additional new bin needs B items. 
To achieve the maximum number of bins (> 2q), ALG must first close each open 
bin and put at least one item into each newly opened bin. From this moment 
in time, opening a new bin requires B new items. It follows that the maximum 
number of bins ALG can use is bounded from above as claimed in the lemma. □ 



Theorem 3.2. GREEDYFIT is c-competitive for OlBcPb^,j with c = min{2q + 
[{qB — 3q+ l)/B\ , B} < min{3g — 1, B}. 

Proof. Let a be any request sequence and suppose GREEDYFIt((t) = w. It 
suffices to consider the case w > 2. Let s be minimum with the property 
that GREEDYFiT(ri, . . . , rs_i) = w — 1 and GREEDYFiT(ri, . . . , r^) = w. By 
the construction of GREEDYFIT, after processing ri,...,rs_i each of the cur- 
rently open bins must contain exactly w—1 different colors. Moreover, since 
w > 2, after processing additionally request r^, GREEDYFIT has exactly q open 
bins (where as an exception we count here the bin where rg is packed as open 
even if by this assignment it is just closed). Denote those bins by bi, . . . ,bg. 

Let bin bj be the bin among bi, . . . ,bq that has been opened last by 
GREEDYFIT. Let r' be the first item that was assigned to bj. Then, the sub- 
sequence a' = Us', . . . ,rg consists of at most qB — (g — 1) items, since between 
Tg/ and Tg no bin is closed and at the moment rg/ was processed, q — 1 bins 
already contained at least one item. Moreover, a' contains items with at least 
w different colors. By I;emma l;s. II opt distributes the items of cr' into at most 
2q + [{qB - 3q + 1)/B\ bins. Consequently, OPt(ct) > 2 q+[(qB- 3 q+i)/B] ■ 

We continue to prove a lower bound on the competitive ratio of GREEDYFIT. 

Theorem 3.3. GREEDYFIT has a competitive ratio greater or equal to 2q for the 
OlBcp_b .9 if B > 2q^ - q'^ - q + 1. 

Proof. We construct a request sequence a that consists of a finite number M of 
phases in each of which qB requests are given. The sequence is constructed in 
such a way that after each phase the adversary has q empty bins. 

Each phase consists of two steps. In the first step items are presented, each 
with a new color which has not been used before. In the second step qB — q^ 
items are presented, all with a color that has occurred before. We will show that 
we can choose the items given in Step 2 of every phase such that the following 
properties hold for the bins of GREEDYFIT: 

Property 1 The bins with indices 1, . . . , g — 1 are never closed. 
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Property 2 The bins with indices 1, . . . ,q— 1 contain only items of different 
colors. 

Property 3 There is an M G N such that during Phase M GREEDYFIT assigns 
for the first time an item with a new color to a bin that already contains 
items with 2q^ — 1 different colors. 

Property 4 There is an assignment of the items of a such that no bin contains 
items with more than q different colors. 

We analyze the behavior of GREEDYFIT by distinguishing between the items 
assigned to the bin (with index) q and the items assigned to bins (with indices) 1 
through q — 1. Let be the set of colors of the items assigned to bins 1, . . . ,q — l 
and let Rk be the set of colors assigned to bin q during Step 1 of Phase k. 

We now describe a general construction of the request sequence given in 
Step 2 of a phase. During Step 1 of Phase k there are items with \Rk\ dif- 
ferent colors assigned to bin q. For the moment, suppose that \Rk\ > q (see 
Tjemma, l,S.(il (iv)). We now partition the at most q^ colors in |i?fc| into q disjoint 
non-empty sets Si,. . . ,Sq. We give qB — q^ > 2q^ items with colors from 
such that the number of items with colors from Sj is B — q for every j, and the 
last |i?fc| items all have a different color. GREEDYFIT will pack all items given 
in Step 2 into bin q lLemma l,'-i.til (iii)). Hence bins 1, . . . ,q — 1 only get assigned 
items during Step 1, which implies the properties 1 and 2. 

The adversary assigns the items of Step 1 such that every bin receives q items, 
and the items with colors in the color set Sj go to bin j. Clearly, the items in 
every bin have no more than q different colors. The items given in Step 2 can by 
construction of the sequence be assigned to the bins of the adversary such that 
all bins are completely filled, and the number of different colors per bin does not 
increase (this ensures that property 4 is satisfied). Due to lack of space we omit 
the proofs of the following lemmas. 

Lemma 3.4. At the end of Phase k < M , bin q of GREEDYFIT contains exactly 
B — ^j^f^ \Lj\ items, and this number is at least q^. □ 



Corollary 3.5. For any Phase k < M , bin q is never closed by GREEDYFIT 
before the end of Step 1 of Phase k. □ 



Lemma 3.6. For k > 1 the following statements are true: 

(i) At the beginning of Phase k bin q of GREEDYFIT contains exactly the colors 
from Rk-i (where Rq %). 

(ii) After Step 1 of Phase k, each of the bins 1, . . . , 1 of GREEDYFIT contains 

at least |i?fc| -I- |i?/c_i| — 1 different colors. 

(iii) In Step 2 of Phase k GREEDYFIT packs all items into bin q. 

(iv) \Rk\ >q. □ 

To this point we have shown that we can actually construct the sequence as 
suggested, and that the optimal offline cost on this sequence is no more than q. 
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Now we need to prove that there is a number M G N such that after M phases 
there is a bin from GREEDYFIT that contains items with 2q^ different colors. We 
will do this by establishing the following lemma: 

Lemma 3.7. In every two subsequent Phases k and k + 1, either \LkDLk+i\ > 0 
or bin q contains items with 2q^ different colors during one of the phases. □ 

We can conclude from Lemma o that at least once every two phases the 
number of items in the bins 1 through q — I grows. Since these bins are never 
closed (property 1), and all items have a unique color (property 2), after a finite 
number M of phases, one of the bins of GREEDYFIT must contain items with 2q^ 
different colors. This completes the proof of the Theorem. □ 

4 The Trivial Algorithm onebin 

This section is devoted to arguably the simplest (and most trivial) algorithm for 
the OlBcp, which surprisingly has a better competitive ratio than GREEDYFIT. 
Moreover, as we will see later that this algorithm achieves essentially the best 
competitive ratio for the problem. 

ONEBIN: The next item is packed into the (at most one) open bin. A new bin 
is opened only if the previous item has closed the previous bin by filling it 
up completely. 

The proof of the upper bound on the competitive ratio of onebin is along 
the same lines as that of greedyfit. 

Lemma 4.1. Let a = ri, ... ,rm be any request sequence. Then for i > 0 any 
algorithm packs the items riB+i , . . . , r(i+i)s into at most min{2(7 — 1, B} bins. 

Proof. Omitted in this abstract. 

Theorem 4.2. Algorithm ONEBIN is c-competitive for the OLBcPs^g with c = 
min{2g — 1, B}. 

Proof. Omitted in this abstract. 

The competitive ratio proved in the previous theorem is tight as the following 
example shows. Let B > 2q — 1. First we give {q — 1)B items, after which by 
definition onebin has only empty bins. The items have q different colors, every 
color but one occurs B — 1 times, one color occurs only q— 1 times. The adversary 
assigns all items of the same color to the same bin, using one color per bin. After 
this, q items with all the different colors used before are requested. The adversary 
can now close q — 1 bins, still using only one color per bin. onebin ends up with 
q different colors in its bin. Then q—1 items with new (previously unused) colors 
are given. The adversary can assign every item to an empty bin, thus still having 
only one different color per bin, while onebin puts these items in the bin where 
already q different colors where present. 
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5 A General Lower Bound for Deterministic Algorithms 

In this section we prove a general lower bound on the competitive ratio of any 
deterministic online algorithm for the OlBcp. We establish a lemma which 
immediately leads to the desired lower bound but which is even more powerful. 
In particular, this lemma will allow us to derive essentially the same lower bound 
for randomized algorithms in Section 0 

In the sequel we will have to refer to the “state” of (the bins managed by) 
an algorithm ALG after processing a prefix of a request sequence a. To this end 
we introduce the notion of a C- configuration. 

Definition 5.1 (C-configuration). Let C a set of colors. A C-configuration is 
a packing of items with colors from C into at most q bins. More formally, a 
C-configuration can be defined as a mapping K : {1, . . . , g} — >■ S<b, where 

S<cB '.= {S \ S is a multiset over C containing at most B elements from S } 



with the interpretation that K{j) is the multiset of colors contained in bin j. We 
omit the reference to the set C if it is clear from the context. 



Lemma 5.2. Let B,q,s £ N such that s > 1 and the inequality B/q > s — 1 
holds. There exists a finite set C of colors and a constant L G N with the following 
property. For any deterministic algorithm ALG and any C-configuration K there 
exists an input sequence ctalg.a: of OLBcPs^g such that 

(i) The sequence (Talg.a uses only colors from C and IcrALG.ifl < L, that is, 
o ' alg.k consists of at most L requests. 

(ii) If ALG starts with initial C-configuration K then ALG(ctalo , k ) > (s - i)g- 
(Hi) If OPT starts with the empty configuration (i.e., all bins are empty), then 

OPT(tTALG,A') < s. Additionally, OPT can process the sequence in such a way 
that at the end again the empty configuration is attained. 

Moreover, all of the above statements remain true even in the case that the online 
algorithm is allowed to use q' > q bins instead of q (while the offline adversary 
still only uses q bins). In this case, the constants \C\ and K depend only on q' 
but not on the particular algorithm ALG. 

Proof. Let C = {ci, . . . , be a set of {s—Vfq^q' colors and ALG be any 

deterministic online algorithm which starts with some initial C-configuration K. 

The construction of the request sequence ctalg.a: works in phases, where at 
the beginning of each phase the offline adversary has all bins empty. During the 
run of the request sequence, a subset of the currently open bins of ALG will be 
marked. We will denote by Pk the subset of marked bins at the beginning of 
Phase k. P\ = % and during some Phase M, one bin in Pm will contain at least 
(s — l)q colors. In order to assure that this goal can in principle be achieved, 
we keep the invariant that each bin b G Pk has the property that the number of 
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different colors in b plus the free space in b is at least (s — l)q. In other words, 
each bin b G Pk could potentially still be forced to contain at least (s — l)q 
different colors. For technical reasons, Pk is only a subset of the bins with this 
property. 

For bin j of ALG we denote by n(j) the number of different colors currently 
in bin j and by /(j) the space left in bin j. Then every bin j G Pk satisfies 
'^U) + /(j) ^ (s ~ !)• By min Pk := minjgp^n(j) we denote the minimum 
number of colors in a bin from Pk- 

We now describe Phase k with 1 < fc < q{s — l)q' ■ The adversary selects a 
set of (s — l)q new colors Ck = {ci, . . . , from C not used in any phase 

before and starts to present one item of each color in the order 

Cl, C2, . • . , Cl, C2, . . . , Cl , C2 , . . . (1) 

until one of the following cases appears: 

Case 1 ALG puts an item into a bin p G Pk- In this case we let Q \ { j G 
Pk '■ n{j) < n{p) }, that is, we remove all bins from Pk which have less than 
n{p) colors. Notice that min^gg n(j) > minP^, since the number of different 
colors in bin p increases. 

Case 2 ALG puts an item into some bin j ^ Pk which satisfies 

n{j) + fU) >{s- l)g- (2) 

In this case we set Q := PkU {j} (we tentatively add bin j to the set Pk). 

Notice that after a finite number of requests one of these two cases must occur: 
Let 61 , . . . , be the set of currently open bins of ALG. If ALG never puts an item 
into a bin from Pk then at some point all bins of {5i, . . . , 6^} \ are filled and a 
new bin, say bin j, must be opened by ALG by putting the new item into bin j. 
But at this moment bin j satisfies satisfies n{j) = 1, /(j) = P — 1 and hence 
n{j) + f{j) = B > {s — l)q which gives 0. 

Since the adversary started the phase with all bins empty and during the 
current phase we have given no more than (s — l)g colors, the adversary can 
assign the items to bins such that no bin contains more than s — 1 different 
colors (we will describe below how this is done precisely). Notice that due to our 
stopping criterions from above (case 1 and case 2) it might be the case that in 
fact so far we have presented less than (s — l)g colors. 

In the sequel we imagine that each currently open bin of the adversary has an 
index x, where I < x < q. Let [3: Ck — t {!,...,(?} be any mapping of the colors 
from Ck to the offline bin index such that |/I“^({a:})| < s — 1 for j = 1, . . . , q. We 
imagine color Cr to “belong” to the bin with index j3{cr) even if no item of this 
color has been presented (yet). For those items presented already in Phase fc, 
each item with color Cr goes into the currently open bin with index (3{cr). If 
there is no open bin with index /3(c^) when the item arrives a new bin with 
index !3{cr) is opened by the adversary to accommodate the item. 

Our goal now is to clear all open offline bins so that we can start a new phase. 
During our clearing loop the offline bin with index x might be closed and replaced 
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by an empty bin multiple times. Each time a bin with index x is replaced by an 
empty bin, the new bin will also have index x. The bin with index x receives a 
color not in (3~^{{x}) at most once, ensuring that the optimum offline cost still 
remains bounded from above by s. The clearing loop works as follows: 

1. (Start of clearing loop iteration) Choose a color c* G Ck which is not con- 
tained in any bin from Q. If there is no such color, goto the “good end” of 
the clearing loop (Step0). 

2. Let F < qB denote the current total empty space in the open offline bins. 
Present items of color c* until one of the following things happens: 

Case (a) : At some point in time ALG puts the £th item with color c* into a 
bin j G Q where 1 < £ < F. Notice that the number of different colors in j 
increases. Let Q' := Q\{b G Q : n(b) < n{j)}, in other words, we remove all 
bins b from Q which currently have less than n{j) colors. This guarantees 
that min^gQ/ n(6) > min;,gQn(&) > minP^. The adversary puts all t items 
of color c* into bins with index P{c*). Notice that during this process the 
open bin with index f3{c*) might be filled up and replaced by a new empty 
bin with the same index. 

Set Q := Q' and go to the start of the next clearing loop iteration (StepCJ. 
Notice that the number of colors from Ck which are contained in Q decreases 
by one, but min{,gQ n{b) increases. 

Case (b): F items of color c* have been presented, but ALG has not put 
any of these items into a bin from Q. 

In this case, the offline adversary processes these items differently from 
case (a): The F items of color c* are used to fill up the exactly F empty 
places in all currently open offline bins. Since up to this point, each offline bin 
with index x had received colors only from the s — 1 element set /3“^({a:}), 
it follows that no offline bin has contained more than s different colors. We 
close the clearing loop by proceeding as specified in the next step. 

3. (Standard end of clearing loop iteration) 

In case we have reached this step, we are in the situation that all offline 
bins have been cleared (we can originate only from case (b) above). We set 
Pk+i '■= Q and end the clearing loop and the current Phase k. 

4. (Good end of clearing loop iteration) 

We have reached the point that all colors from Ck are contained in a bin 
from Q. Before the first iteration, exactly one color from Ck was contained 
in Q. The number of colors from Ck which are contained in bins from Q can 
only increase by one (which is in case (a) above) if min^gg n(b) increases. 
Hence, if all colors from Ck are contained in bins from Q, min^gg n{v) must 
have increased (s — l)g — 1 times, which implies min^gg n(6) = (s — l)q. In 
other words, one of ALG’s bins in Q contains at least (s — l)g different colors. 
The only thing left to do is append a suitable suffix to our sequence con- 
structed so far such that all open offline bins are closed. Clearly this can be 
done without increasing the offline-cost. 

In case the clearing loop finished with a “good end” we have achieved our 
goal of constructing a sufficiently bad sequence for ALG. What happens if the 
clearing loop finishes with a “standard end”? 
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Claim. If Phase k completes with a “standard end” , then min > min Pk or 

l^’fc+ll > \Pk\- 

Before we prove the above claim, let us show how this claim implies the 
result of the lemma. Since the case |Pfc+i| > \Pk\ can happen at most q' times, 
it follows that after at most q' phases minP^ must increase. On the other hand, 
since min Pk never decreases by our construction and the offline costs remain 
bounded from above by s, after at most q{s — l)q' phases we must be in the 
situation that minP^ > (s — 1)<7, which implies a “good end”. Since in each 
phase at most (s — 1)<7 new colors are used, it follows that our initial set C of 
(s — l)^q^q' colors suffices to construct the sequence aALc.K- Clearly, the length 
of ctalg.k can be bounded by a constant L independent of ALG and K. 

Proof (of Claim). Suppose that the sequence 0 at the beginning of the phase 
was ended because case 1 occurred, i.e., ALG put one of the new items into a 
bin from Pk. In this case min^gg n(b) > minP^. Since during the clearing loop 
min{,gQ n{b) can never decrease and Pk+i is initialized with the result of Q at 
the “standard end” of the clearing loop, the claim follows. 

The remaining case is that the sequence (Q]) was ended because of a case 2- 
situation. Then \Q\ — \Pk U {j}| for some j ^ Pk and hence \Q\ > \Pk\. During 
the clearing loop Q can only decrease in size if min^gg n{i) increases. It follows 
that either |Tfc+i | = |Pfc | + 1 or min Pk+i > min Pk which is what we claimed. □ 

This completes the proof of the lemma. □ 

As an immediate consequence of Lemma 15.21 we obtain the following lower 
bound result for the competitive ratio of any deterministic algorithm: 

Theorem 5.3. Let B,q, s G N such that s > 1 and the inequality B/q> s — 1 
holds. No deterministic algorithm for OLBcPs^q can achieve a competitive ratio 
less than {s-l)/s-q. Hence, the competitive ratio of any deterministic algorithm 
for fixed B and q is at least ^1 — q. In particular, for the general case with 

no restrictions on the relation of the capacity B to the number of bins q, there 
can be no deterministic algorithm for OlBcPb^^ that achieves a competitive ratio 
less than q. All of the above claims remain valid, even if the online algorithm is 
allowed to use an arbitrary number q' > q of open bins. □ 



6 A General Lower Bound for Randomized Algorithms 

In this section we show lower bounds for the competitive ratio of any random- 
ized algorithm against an oblivious adversary for OlBcPb^,. The basic method 
for deriving such a lower bound is Yao’s principle (see also j^lYlj l. Let A be a 
probability distribution over input sequences S ={ ax ■ x G X } . We denote the 
expected cost of the deterministic algorithm ALG according to the distribution X 
on E by Ex [ALG((Ta;)]. Yao’s principle can now be stated as follows. 
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Theorem 6.1 (Yao’s principle). Let { ALGy ■ y G y} denote the set of de- 
terministie online algorithms for an online minimization problem. If X is a 
probability distribution over input sequences {a^ '■ x & X} such that 

inf Ex [ALGy((Ta:)] > cEx [OPT(cTa;)] . (3) 

v&y 

for some real number c > 1, then c is a lower bound on the competitive ratio of 
any randomized algorithm against an oblivious adversary. □ 

Theorem 6.2. Let B,q,s £ ^ such that s > 1 and the inequality B/q > s — 1 
holds. Then no randomized algorithm for OLBcPs^y can achieve a competitive 
ratio less than (s — l)/s ■ q against an oblivious adversary. In particular for 
fixed B and q, the competitive ratio against an oblivious adversary is at least 
q. All of the above claims remain valid, even if the online algorithm 
is allowed to use an arbitrary number q' > q of open bins. 

Proof. Let A := { ALGy : y G y} the set of deterministic algorithms for Ol- 
Bgp B, q- We will show that there is a probability distribution X over a certain 
set of request sequences {ax : x G X} such that for any ALGy S .4 we have 
Ex [ALGy((Ta;)] > (s — l)q, and, moreover, Ex [OPT((7a;)] < s. The claim of the 
theorem then follows by Yao’s principle. 

Let us recall the essence of Lemma |^1 The lemma establishes the existence 
of a finite color set C and a constant L such that for a fixed configuration K 
any deterministic algorithm can be “fooled” by one of at most \C\^ sequences. 
Since there are no more than \C\^^ configurations, a fixed finite set of at most 
N := sequences S = {a\, . . . ,ax} suffices to “fool” any deterministic 

algorithm provided the initial configuration is known. 

Let Y be a probability distribution over the set of finite request sequences 
{ CTij , (7^2 , . . . , : k G N, 1 < ij < N} such that ai^ is chosen from E uni- 
formly and independently of all previous subsequences (7^^ , . . . , . We call 

subsequence CTij, the kth phase. Let ALGy G Ahe arbitrary. Define by 

Cfc := Prx [ALGy has one bin with at least (s — l)q colors during Phase k] . 

(4) 

The probability that ALGy has one bin with at least (s — l)g colors on any given 
phase is at least 1/iV, whence Ck > 1/Y for all k. Let 

Pk := Prx [ALGy((Jii • ■ -o-ifc.icrij > (s - l)g] . (5) 

Then the probabilities pk satisfy the recursion: po = 0 and pk = Pk-i + (1 ~ 
Pk-i)ck- The first term in the latter equation corresponds to the probability that 
ALGy has already cost at least (s — l)g after Phase k—1, the second term accounts 
for the probability that this is not the case but cost at least (s — l)q is achieved 
in Phase k. By construction of X, these events are independent. Since > 1/N 
we get that pk > Pk-i + (1 — Pk-i)/N. It is easy to see that any sequence of 
real numbers pk G [0, 1] with this property must converge to 1. Hence, also the 
expected cost Ex [ALGy(cra;)] converges to (s — 1)(7. On the other hand, the offline 
costs remain bounded by s by the choice of the according to Lemma|^21 □ 



Online Bin Coloring 



85 



7 Conclusions 

We have studied the online bin coloring problem OlBcp, which was motivated 
by applications in a robotized assembly environment. The investigation of the 
problem from a competitive analysis point of view revealed a number of odds. A 
natural greedy-type strategy (greedyfit) achieves a competitive ratio strictly 
worse than arguably the most stupid algorithm (onebin). Moreover, no algo- 
rithm can be substantially better than the trivial strategy (onebin). Even more 
surprising, neither randomization nor “resource augmentation” helps to over- 
come the fi{q) lower bound on the competitive ratio (see |9I8| for successful ap- 
plications to scheduling problems) can help to overcome the 0{q) lower bound 
on the competitive ratio. Intuitively, the strategy greedyfit should perform 
well “on average” (which we could sort of confirm by preliminary experiments 
with random data). 

An open problem remains the existence of a deterministic (or randomized) 
algorithm which achieves a competitive ratio of q (matching the lower bound of 
Theorems 15.31 and IQ) . However, the most challenging issue raised by our work 
seems to be an investigation of OlBgp from an average-case analysis point of 
view. 

Acknowledgements. The authors would like to thank Errol Lloyd (University 
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Abstract. The first general decomposition theorem for the fc-server 
problem is presented. Whereas all previous theorems are for the case 
of a finite metric with fc + 1 points, the theorem given here allows an 
arbitrary nnmber of points in the underlying metric space. This theo- 
rem implies 0(polylog(/c))-competitive randomized algorithms for cer- 
tain metric spaces consisting of a polylogarithmic number of widely sep- 
arated sub-spaces, and takes a first step towards a general 0(polylog(fc))- 
competitive algorithm. The only other cases for which polylogarithmic 
competitive randomized algorithms are known are the uniform metric 
space, and the weighted cache metric space with two weights. 



1 Introduction 

The k-server problem is one of the most intriguing problems in the area of 
online algorithms UnEH]. Furthermore, it has as special cases several important 
and practical problems. The most prominent of these is weighted caching, which 
has applications in the management of web browser caches. We investigate the 
randomized variant of this problem, for which very few results are known. Our 
main result is a theorem which allows us to construct 0(polylog(fc))-competitive 
randomized algorithms for a broad class of metric spaces and provides a first 
step towards a general solution. 

Central to the fc-server problem is the k-server conjecture, which proposes 
that there is a deterministic fc-server algorithm which is fc-competitive for all 
metrics. Although it has received less attention, the randomized k-server conjec- 
ture is just as intriguing. This conjecture puts forward that there is a 0(log fc)- 
competitive randomized fc-server algorithm for all metrics. 

There has been much work on the fc-server conjecture since it was proposed 
by Manasse, McGeoch and Sleator 123 . It is easy to show a lower bound of 
fc m The best upper bound result for an arbitrary metric space is 2fc — 1, due 
to Koutsoupias and Papadimitriou | 22 |. In addition, fc-competitive algorithms 
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have been exhibited for a number of special cases We refer the 

reader to Chapter 4 of PB| for a more comprehensive treatment of the status of 
the deterministic conjecture. 

The situation for randomized algorithms is less satisfactory. Only a small 
number of algorithms are known for specific metrics. These are as follows: For the 
uniform metric space, matching upper and lower bounds of Hk = 1 + 5 + ’ ’ ’ + ^ 
are known. The lower bound is due to Fiat et al. while the upper bound 
is presented by McGeoch and Sleator PSl- A O(logfc) -competitive algorithm 
for the weighted cache problem with 2 weights has recently been exhibited by 
Irani m- For the case of 2 servers on the isosceles triangle, Karlin, Manasse, 
McGeoch and Owicki m show matching upper and lower bounds of For 
the case of 2 servers on the real line, a ^ < 1.98717-competitive algorithm 
has been developed by Bartal, Chrobak and Larmore 0- Finally, the case where 
we have a finite metric with fc -I- 1 points is closely related to the metrical task 
system (MTS) problem The results on that problem imply that there 

is a 0(polylog(fc))-competitive algorithm for every finite space with k + 1 points. 
In summary, the only two metrics for which a polylogarithmic competitive algo- 
rithm exists for general k are the uniform metric uni and the 2-weighted cache 
metric uni- In particular, we are lacking a good randomized algorithm for the 
general weighted cache problem. 

The status of randomized lower bounds is also displeasing. Kaloff, Rabani and 
Ravid showed the first lower bound, namely J7(min{log k, log log n}), where n 
is the number of points in the metric space. Karloff, Rabani and Saks improved 



this by showing a lower bound of G(-^log k/ log log k) for all spaces. This has 
recently been further improved to G(log A:/ log^ log fc) by Bartal, Bollobas and 
Mendel 0. For fc = 2, a lower bound of 1 -I- l/\/e > 1.60653 is presented by 
Ghrobak, Larmore, Lund and Reingold ESI As mentioned before, a lower bound 
of Hk holds for the uniform space m- As pointed out by Seiden ^f|, the work 
of Blum et al. implies a lower bound of log2(fc -I- 1) for a certain family of metric 
spaces. Note that whereas in the deterministic setting we conjecture that k is 
the correct bound for all spaces, the preceding discussion implies that in the 
randomized setting the competitive ratio depends on the underlying space. 

Our main theorem gives an 0(polylog(fc))-competitive algorithm for metric 
spaces which can be decomposed into a small number of widely separated sub- 
spaces. The theorem may be applied recursively. As we shall argue in the next 
section, we feel that this is the first step towards a resolution of the randomized 
conjecture. Another important contribution of this work is in illustrating the 
usefulness of unfair metrical task systems (UMTS) | |26I3I17| as a general algo- 
rithmic design tool. Unfair metrical task systems allow us to design “divide and 
conquer” online algorithms. As far as we know, this work is the first application 
of the UMTS technique outside of the metrical task system problem. 



2 A Line of Attack on the fc-Server Conjecture 

As we have already mentioned, the fc-server problem and metrical task system 
problem are in certain special cases equivalent in terms of competitive ratio. 
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An important contribution of this paper is in revealing a new and important 
connection between these problems. While it has long been known that the 
finite fc-server problem can in general be modeled as an MTS, the unsatisfying 
result is a polylog((^))-competitive algorithm, where n is the number of points 
in the metric space. Our main contribution is in recognizing that if we instead 
model only the most important components of the fc-server problem using an 
unfair MTS, we get a much better result. 

Blum, Burch and Kalai |HI argue that the line of attack which leads to ran- 
domized 0(polylog(n))-competitive algorithms for the MTS problem is a logical 
one to use for the fc-server problem. We also feel that this strategy is likely to 
be fruitful, and the result provided here brings us one step closer to realizing it. 
There are two components to the MTS line of attack: 

— The metric space approximation technique developed by Bartal j1 . This 
technique allows one to approximate any metric space using a specific type 
of space called an fc-hierarchical well-separated tree (fc-HST). An fc-HST is 
a metric space with diameter A, which recursively consists of fc-HST sub- 
spaces of diameter at most A/h. The distance between any two points in 
separate sub-spaces is A. Specifically, Bartal gives a method of finding a 
probability distribution over fc-HST’s such that the expected distance in the 
HST is with a factor of 0{hlognloglogn) of the distance in the original 
space. 

— Randomized algorithms for HST metric spaces pinj. The key subroutines 
in these algorithms are algorithms for spaces which consist of two or more 
widely separated sub-spaces. Decomposition theorems, providing upper and 
lower bounds for the MTS problem on such spaces are presented Blum et 
al. PI and Seiden m- 

The first component carries over directly to the fc-server problem. However, 
because distances are distorted by a factor polylogarithmic in the number of 
points in the space, it can (currently) only yield polylogarithmic competitive 
algorithms for spaces with a polynomial number of points. 

Except in the case of fc -I- 1 points, MTS algorithms for HST spaces cannot 
be adapted to the fc-server problem — it would seem that a new approach is 
needed. Along these lines, Blum, Burch and Kalai |H1 have given a first step 
towards a decomposition theorem for the fc-server problem. However, their result 
is incomplete, in that no way of modeling the distribution of servers in the sub- 
spaces is proposed. In this paper, we correct this problem, and give the first 
working decomposition result for the fc-server problem where the number of 
points in the underlying metric space is unrestricted. 

Our approach is to carefully model the fc-server problem using an UMTS, and 
show that the costs in the model differ insignificantly from the actual costs. Un- 
fortunately, while the basic idea is not too hard, there are many messy technical 
details to be resolved. 
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3 Preliminaries 



For a general introduction to competitive analysis we refer the reader to uni- 
We define cost^ (cTjC) be the cost incurred by A on request sequence a starting 
at initial configuration C . Let cost(cr, C) be the cost incurred by the optimal 
offline algorithm on request sequence a starting at initial configuration C . We 
need the following terminology, which is akin to the definition of constrainedness 
introduced by Fiat and Mendel ini- Randomized online algorithm A is said to 
be c-competitive and f -inhibited if 

E[cost^((T, C)] < ccost(cr, C) + /, 

for all (7 and C. Since this is a worst case measure, for the purposes of analysis, we 
assume that the input sequence is generated by a malicious adversary, who forces 
the algorithm to perform as badly as possible, where the expectation is taken 
over the random choices of the algorithm. There are several types of adversaries 
in the randomized scenario; we use exclusively the oblivious adversary P|. 

Let (P,dp) be a metric space. We define the diameter of a non-empty set of 
points X C P to be A{X) = sup^, dp{x,y). For each positive integer i and 
X,Y € P® we define the distance between X and Y to be 



dp{X,Y) 



min 

zeii(x,Y) 



^ dp{x,y), 

{x,y)ez 



where p{X, Y) is the set of all maximal matchings on the complete bipartite 
graph induced on X and Y. 

Let U = {Ui , . . . , Ut} be a partition of P. Define V = A{U) and 



9(U, V) = inf inf 

u€Uv€V 



dp{u,v) 

V 



We are interested in metrics where 



6 = mm 9{U, V) 

U,V€ U 
U^V 

is large. I.e. the subspaces are widely separated and have small diameter relative 
to the distances between them. We call such a space 6 -decomposable. We give an 
example of a space decomposable into three sub-spaces in Figured 

Note that any /i-HST is recursively /i-decomposable; a /c-HST is just a special 
type of decomposable space. 

In the fc-server problem, we have k mobile servers, each of which is located 
at some point in a metric space (P,dp). We are given Cq € P^, the initial 
configuration of our servers. We are then confronted with cr = pi,p 2 , ■ ■ ■ ,Pn, a 
sequence of request points in P. Each request point must be served, by moving 
some server to it (if one is not already there). Formally, a configuration is a 
member of P^ . A service is a sequence of configurations tt = Ci , . . . , where 
Pi G Ci for 1 < i < n. The cost of the service tt is 

n 

cost(7T, Co) = ^ dp{Ci_i,Ci). 

2=1 
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A{P) 




Fig. 1 . A ^-decomposable space P with sub-spaces U = {U\, U2, U3}. 



Our goal is to produce a service with low cost. An algorithm is online if, in 
the service it produces, Ci is a function of only pi,...,pi. Since it is not in 
general possible to produce the optimal service online, we consider approximation 
algorithms. 

Let (P, dp) be a metric space with P finite. Let n = |P|. In the metrical task 
system (MTS) problem on (P,dp), there is a single mobile server. We refer to 
the points of an MTS as states. We are given an initial state sqi and a sequence 
of tasks, a = Pi, . . . ,Pat. Each task is a function Ti ■. P ^ K>o U { 00 }. For 
s G P, Ti(s) is the local cost for serving task i in state s. The goal is to minimize 
the sum of the local costs and distances moved by the server. 

Formally, an (MTS) service is a sequence rn = sq; si: ■ • ■ ; sn of states. The 
cost of the service w is 

n 

cost(zz7, So) = ^ dp{si-i,Si) + Ti{si). (1) 

An algorithm is online if Si is a function of only Pi, . . . , T^. 

With respect to competitive analysis, it is also possible to consider the unfair 
metrical task system problem on P. In this case, we are given real numbers a > 1 
and /3 > 1. The adversary pays © for its service, whereas the online algorithm 
pays 

n 

cost* (ro. So) = ^/3dp(si_i,Si) -I- aP^(si). 
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A more general definition is possible, for instance, a might be a function of Si. 
However, this simple definition shall suffice for our purposes. 

The following result proves to be extremely useful: 

Theorem 1. There exists an a + 0{piog^ nlog^ logn)-competitive randomized 
algorithm for the UMTS problem on any metric space of n points. 

This follows almost directly from HH. For certain specific metrics, the upper 
bound in the preceding theorem can be improved. 



4 A fc-Server Algorithm for Decomposable Spaces 



We now develop an algorithm for any metric space that is ^-decomposable. To 
begin, we study the structure of the optimal offline service within an arbitrary 
sub-space. We then study the “big picture,” showing how to put together optimal 
offline services for sub-spaces to get a complete service. This big picture problem 
is formulated as an UMTS. Finally, we show how to use an algorithm for UMTS 
to get an algorithm for the original A:-server problem. 

We use the following notation: We denote the empty sequence by e. For 
p G P, define cr A p to be the concatenation of a by p. For 1 < f < j < n, we 
denote af = Pi,Pi+i, . . . ,pj. For i > j let af = e. 

Let X be some arbitrary non-empty subset of P. For p G X, define A®(p) to 
be the subset of X® whose members all contain p. We define a (1 X to be the 
maximal subsequence of a containing points only from X. 

For 1 < f < fc, the i-server work function on X is defined as follows: 



wf{e,i,X) 
wf {a A p, i, X) 
wi{a,i,X) 



dp{C,Ir\X); 

(wf{a,i,X), iip^X; 

\^^iDex^(p){wf{cr,i,X)+dp{C,D)}, iipGX; 

inf w?(a,i,X). 

Cgx* 



This is a generalization of the work function introduced by Chrobak and Lar- 
more m. Intuitively, wf (cr, z, X) is the optimal offline cost to serve requests in 
cr n X using i servers that stay inside X, starting at configuration / and ending 
at configuration C. We make use of the following facts, which follow directly 
from the definition of w: 

Fact 41 For any non-empty X(ZP;i>l; I, F,C,Dg X® and a we have 

\wf{a,i,X) - wf{a,i,X)\ < dp{C,D), 

|wg(cr,i,X) - w^{a,i,X)\ < dp{C,D). 

These are known as the slope conditions EH- Intuitively, the slope conditions 
follow from the fact that to serve a sequence and end up in a certain configura- 
tion, one could serve the sequence ending in a different configuration and then 
switch configurations. 
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Fact 42 For any non-empty X C P; i > 1; I,F G X*; a and 0 < j < n there 
exists a C G such that wf (a, i, X) = X) + wf (u(,i, X) . 

In essence, this just says that there is some configuration at each step of the 
optimal offline service. 

For 0 < i < k, the i-server cost on X of the request p is defined to be 

ro, ifp^X; 

t(ct a p, t, X, /) = < oo, if t = 0 and p G X; 

[ wi{a A p, i, X) — wi{a, i, X), otherwise. 

The total optimal offline cost for serving a is wc 7 g(cr, fc, P). However, rather 
than using wco{(r,k,P) directly, we compute a lower bound on it. Using this 
approach, our evaluation of the optimal offline cost is not perfect, however, the 
“big picture” is much more clear. This turns out to be the key to designing an 
algorithm with good competitive ratio. 

Given these definitions, we now explain our approach to modeling the /c-server 
problem using an UMTS. To begin, using (P, dp) and the partition U, we define 
a new metric space (S,ds) = <P{P, dp, U). Consider the set of all configurations 
of Pservers on P. We consider two configurations C and D to be equivalent if 
|C n P| = |P> n U\ for all U G U. The set of states is the set of equivalence 
classes of configurations. The number of such classes is s = shall 

also consider each state to be a function <j> : 14 i-G- Z>g. is the number of 

servers located in U . We index these functions using the non-negative integers 
and denote the set of all states as S. 0o is the state containing Co- 
in order for {S, ds) to be well defined as a metric space, we require 6 > 2k. 
If (j) and ip are states then 



ds((/),p) = ("l - inf dp{C,D). 

\ u J CG4>,D^(fi 

I.e. this is 1 — 2k/6 times the minimum cost for moving between a configuration 
in (j) and one in p. 

We can compute a lower bound on the optimal offline cost to serve the k- 
server request sequence a as follows: If (/) is a state, we fix ({<p) to be some 
arbitrary fc-server configuration in the equivalence class of (j). When calculating 
the optimal offline cost for state 4>, we use C((/)) as our starting configuration. 
This seems a bit crude at first, but does not affect the overall result. Define the 
task induced by a to be 

T(cr) = (T(0o,CT),r((/)i,cr ), . . . 



where 

T(<^,a)= ^ r(a,</>(P),P,C(<^)). 

ueu 



The task sequence is p = T{al),T{a1 ), . . . , r(cr"). 




A General Decomposition Theorem for the fc-Server Problem 



93 



Define 



W^{e) = dsict>,^), 

(cr A p) = inin { (ct) + T(V', cr A p) + ds (</), r/') } , 

O 

W^{cr) = minW^ia). 

Note that this is a s state metrical task system, with task sequence g and initial 
state (j)o. The optimal offline cost for serving the tasks induced by a is 
which is the MTS work function. 

Lemma 1. If 0 > 2k then for all input sequences a, wcgicr, k, P) > W(j,^{a) — 

0 ( 1 ). 

Proof. Omitted due to space considerations. □ 

We have shown how to use the MTS (S', ds) to lower bound the optimal offline 
cost for the original fc-server problem. Now we show how to design a randomized 
/c-server algorithm for (P,dp), given that we have competitive algorithms for its 
sub-spaces. Specifically, we assume that we have a-competitive, r/ljV-inhibited 
algorithms for the j-server problems for all C/ G U and all j < k. In order 
to combine these algorithms, we shall need to make (S,ds) into an UMTS as 
follows: While the optimal offline algorithm pays T{(j>, a) for serving T{a) in 
state (j) and dsitj^Tip) for moving from state <j) to state ip, we charge the online 
algorithm aT{(j),a) and j3ds{4>,ip), respectively. 

Our algorithm, which we call Deco, operates as follows: We simulate some 
algorithm Unfair for the UMTS problem on (S', ds). At each request we compute 
(^> '(’> ^ From these, we compute the task vector T(cr). 

We feed this vector to Unfair. Whenever Unfair changes state, we move servers 
between sub-spaces, maintaining the invariant that if Unfair is in state </>, then 
the servers are in some configuration in (f. 

To further describe the algorithm, we define some notation. We break the 
request sequence into phases. The first phase starts at the beginning of the 
sequence, and ends when the state of Unfair changes. In general, the ith phase 
begins immediately after the end of phase i — 1, and is completed when Unfair 
changes state for the tth time. The last phase is terminated by the end of the 
sequence. We define Xi to be the state during the tth phase. We can assume 
without loss of generality that when the state changes, exactly one server moves 
between sub-spaces; any multiple server configuration change can be seen as a 
sequence of single server moves. 

At the beginning of the zth phase, we have some configuration li of servers. 
If i = 1 this is just the initial configuration, while if t > 1 this is dictated by the 
behavior of the algorithm at the end of phase i — 1. 

During the phase, we use the algorithm A{U, Xi{U)) to control the movements 
of servers within U, for each U G 14. At the beginning of the phase, these 
algorithms are initialized, and given the starting configurations li (1 U,U G 14. 
During the phase, we give each request to a point in U to A{U, Xi{U)). Within 
each U G U, the servers of Deco mimic those of A{U, Xi{U)) exactly. 
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At the end of the phase, a server moves between sub-spaces. We move it to 
an arbitrary starting point in the destination sub-space. We define Fi to be the 
final configuration of servers, before this occurs. 

Lemma 2. If 

9 + 2 kf) + 2k 
^ - 9 -2k ^ 9 ’ 

then for all a, E[cosWo(cr, Cq)] < E[costuNEAm(£', so)] + 0(1). 

Proof. Fix the set of random choices made by Unfair. Once this is done, the 
behavior of Unfair is deterministic. For all sets of random choices, we show that 
the cost incurred by Unfair is within an additive constant of the cost incurred 
by Deco. Taking expectations over all possible random choices gives the desired 
result. 

Define £i be the index of the last request of the ith phase, and m be the 
number of phases. Further define fi = U_i-|-1, £o — 0, £m+i = n and Xm+i = Xm- 
Let Zi be the cost incurred by Deco during the ith phase. For the sake 
of readability, we shall drop subscripts of i in the following arguments. By the 
definition of Deco we have 

Z < dp{F, li+i) + ^ r\U,I (lU) 

ueu 

< ^*+i) + fciAV -f a ^ wi{aj, X{U),U). 

ueu 

By FactEl for some configuration D, the cost incurred by Unfair during this 
phase is 



aY,T{X,a{) + pds{X,X 

i+l) 

= a ^ (zcC(A)(fTf,A(C/),U)-u;c(A)(iTf-\A(U),[/)) +/3d5(A,A+i) 
ueu ^ 

= a ^ (wD{aj, X{U), U) + HU),U) - HU), U)) 

ueu ^ 

+ (3 ds{X, Ai+i) 

> a ^ X{U), U) - A(U)V) +/3ds{X, A+i) 

ueu 

>aJ2 {wiiaj, X{U), U) - 2A(t/)v) + /3ds(A, A,+i) 
u^u ^ 

= a ^ wi{Hf,X{U),U) -2kV + (ids{\X,+i). 

ueu 



The two inequality steps are by the slope conditions. If Xi yf Aj+i then this is at 
least Z. Xi = Ai+i is only true in the last phase. □ 
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Putting this all together, we get our main theorem: 
Theorem 2. Let (P, dp) he a metric space and define 



^ , 2fc + 2 ktjj + 2k 



k + t — 1'^ 



Suppose that 

1. (P, dp) is 9 -decomposable into Ui, . . . ,Ut with 0 > 2k; 

2. we have an a (3g{s) competitive algorithm for the UMTS problem on the 
metric (S,ds) = 'P{P,dp,U); 

3. we have a-competitive ipkS/ inhibited algorithms for Ui , . . . , Ut; 

then Deco is a-\- (3 g{s)~ competitive for the k-server problem on (P, dp). 

Using Theorem Q we get a a + 0{(3t\o^{k + t) log^(t log(/c + t)))-competitive 
algorithm. The term tlog^{k + t) log^(t log(fc + t)) is polylogarithmic in k when 
t is polylogarithmic in k. For the case of t = 2, the metric (S', d$) is isomorphic 
to A: + 1 evenly spaced points on the line. By exploiting the special structure of 
this metric, we get an a + 0{fUog^ ^(-competitive fc-server algorithm. 

5 Application to Specific Metrics 

In addition to having potential application towards the eventual solution of the 
randomized fc-server problem, our main theorem can be applied to get random- 
ized algorithms for a number of specific metric spaces. We give a few examples 
here: 

Suppose we have a finite metric space which is 17(A; log ^(-decomposable into 
0(logfc( uniform sub-spaces, each with diameter 1. We use the Mark algo- 
rithm within each sub-space. Mark is 2i/fc-competitive and 0(A:log/i:(-inhibited. 
f) = 0(1( and therefore Deco is 2H)^ 0(log^ fc log^ log^ fc( = 0(polylog(A;((- 
competitive. 

As a second example, the balanced metric space B(2*,0( is defined as fol- 
lows: 1( B(1,0( consists of a single point. 2) i?(2*,0( consists of two copies of 
i?(2®“^,0(, call them T and C/, such that d{t,u) = 0®, for all t G T and u G U. 
Deco is 0(log^ ^(-competitive for balanced spaces for sufficiently large 9. 

6 Conclusions 

We have presented the first general decomposition theorem for the fc-server prob- 
lem. We feel that such theorems will inevitably be a part of the final resolution 
of the randomized fc-server conjecture. It is our hope that the result presented 
here provides the impetus for further work on this fascinating problem. 

Some steps in the right direction are as follows: 1( The construction of Bar- 
tal allows us to 0(/ilognloglogn( approximate any metric with /i-HST’s. 
While our decomposition theorem can be applied to give algorithms for /i-HST’s, 
we require that h = 9 > 2k. To get an algorithm for general spaces, we need a 
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decomposition theorem which can be applied to /i-HST’s with h polylogarithmic 
in k. 2) Another direction of progress would be to improve the dependence of 
the competitive ratio on the number of sub-spaces in the decomposition. Bar- 
tal’s construction gives no upper limit on the number of such sub-spaces. I.e. 
we might potentially have an /i-HST which decomposes into 0{n) sub-spaces. 
Our decomposition theorem has a competitive ratio which is super-linear in 
the number of sub-spaces. Thus we need either to modify Bartal’s technique to 
guarantee a restricted number of sub-spaces at each /i-HST level, or improve the 
competitive ratio guarantee of the decomposition theorem to be polylogarithmic 
in the number of sub-spaces. If we could overcome these two short-comings, we 
would have a 0(polylog(A:))-competitive fc-server algorithm for any metric space 
on n = 0(poly(fc)) points. 

We also feel that the result presented here is important in that it illustrates 
how unfair metrical task systems can be used to design randomized online algo- 
rithms. The only previous application was in the design of MTS algorithms. We 
believe that the UMTS technique should have much wider applicability. 
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Abstract. We consider a variant of the online paging problem where 
the online algorithm may buy additional cache slots at a certain cost. 

The overall cost incurred equals the total cost for the cache plus the 
number of page faults. This problem and our results are a generalization 
of both, the classical paging problem and the ski rental problem. 

We derive the following three tight results: (1) For the case where the 
cache cost depends linearly on the cache size, we give a A-competitive 
online algorithm where A « 3.14619 is a solution of A = 2 + In A. This 
competitive ratio A is best possible. (2) For the case where the cache cost 
grows like a polynomial of degree d in the cache size, we give an online 
algorithm whose competitive ratio behaves like d/lnd + o(d/lnd). No 
online algorithm can reach a competitive ratio better than d/lnd. (3) 

We exactly characterize the class of cache cost functions for which there 
exist online algorithms with finite competitive ratios. 

1 Introduction 

The classical problem. The paging problem considers a two level memory system 
where the first level (the cache) can hold k pages, and where the second level 
(the slow memory) can store n pages. The n pages (n k) in slow memory rep- 
resent virtual memory pages. A paging algorithm is confronted with a sequence 
of requests to virtual memory pages. If the page requested is in the cache (a page 
hit), no cost is incurred; but if the page is not in the cache (a page fault), then 
the algorithm must bring it into the cache at unit cost. Moreover, the algorithm 
must decide which of the k pages currently in cache to evict in order to make 
room for the newly requested page. 

The paging problem has inspired several decades of theoretical and applied 
research and has now become a classical problem in computer science. This is 
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due to the fact that managing a two level store of memory has long been, and 
continues to be, a fundamentally important problem in computing systems. The 
paging problem has also been one of the cornerstones in the development of the 
area of online algorithms. Starting with the seminal work of Sleator & Tarjan [ 7 ] 
which initiated the recent interest in the competitive analysis of online algo- 
rithms, the paging problem has motivated the development of many important 
innovations in this area. 

In the ojfline version of the paging problem the request sequence is a pri- 
ori known to the algorithm. Belady P| gives a polynomial time optimal offline 
algorithm for paging. In the online version of the paging problem each request 
must be served without any knowledge of future requests. An online algorithm 
is R- competitive if on all possible request sequences the ratio of the algorithm’s 
cost to the optimal offline cost is bounded by the constant R. The competitive 
ratio of an online paging algorithm is the smallest such constant R. Sleator & 
Tarjan |Zj show that for cache size k, the online algorithm LRU (which always 
evicts the least recently used page) has a competitive ratio of k and that no 
better ratio is possible. In fact, they prove the following more general result that 
we will use many times in this paper. 

Proposition 1. (Sleator & Tarjan w 

If LRU with cache size k is compared against an optimal ojfline algorithm with 
cache size I, then LRU has a competitive ratio of k/(k — i +1). 

No better competitive ratio is possible: For every online algorithm A with 
cache size k, there exist arbitrarily long request sequences a such that A faults 
on every page of a whereas the optimal ojfline algorithm with cache size I only 
faults on a fraction {k — £ + l)/k of a. □ 

For more information on online paging we refer the reader to the survey chapter 
of Irani and to the book of Borodin & El Yaniv P|. 

The problem considered in this paper. In all previous work on online paging 
a basic assumption was that the cache size k is fixed a priori and cannot be 
changed. In this paper, we consider the situation where at any moment in time 
the algorithm may increase its cache size by purchasing additional cache slots. 
If its final cache size is x, then it is charged a cost c{x) for it. Here c : IN — ?> 
is a non-decreasing, non-negative cost function that is a priori known to 
the algorithm. An equivalent way of stating this problem is the following: The 
algorithm starts with no cache. At each request, the algorithm may increase its 
cache from its current size Xi to a larger size X2 at a cost of c(x2) — c{xi). 

We have three basic motivations for looking at this problem. First, and most 
importantly, the classical online analysis of paging is criticized for being overly 
pessimistic. For instance, LRU (and all other deterministic algorithms) cannot 
have a competitive ratio smaller than the size of the cache, which can be arbitrar- 
ily large. However, in practice LRU typically performs within a small constant 
ratio of optimal for all cache sizes. For any particular instance of our problem any 
algorithm will either have a constant competitive ratio or not be competitive. 
Second, we see this version as a first approximation to the problem of deciding 
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when and how to upgrade memory systems. Though our version ignores many 
practical concerns, it incorporates enough to illustrate ideas like amortizing cost 
over time and balancing cost against performance. Third, we can model a system 
where a portion of memory is reserved for the use of one high priority process 
(or set of processes) and balance the faults incurred by this process against the 
decreasing system performance for all other processes. 

The offline version of this problem is quite easy to solve: For a given cost 
function c{k) and a given request sequence cr, there are only a polynomial number 
of possible optimal cache sizes (from 0 to the length of the request sequence) . All 
of them can be checked in polynomial time by computing the cost of Belady’s 
algorithm |P| on the sequence cr with this cache size, and by adding the cache 
cost to the result. The best such solution yields the global optimum. Note that 
for many special cost functions c{k) (e.g., convex functions) there are even faster 
offline methods for finding the optimal cache size. 

Now let us turn to the online version of this problem. If the function c{x) 
has the form c(x) = 0 for x < k and c{x) = oo for x > k, then we are back at 
classical paging with cache size k. Moreover, our problem captures some features 
of classical renting-versus-buying problems (the most simple instance of which 
is the well known ski rental problem (see Karlin, Manasse, McGeoch & Owicki 
0))- Imreh & Noga ^ analyze machine scheduling problems where the online 
algorithm may adjust its resources at an additional cost. 

Notation. For a fixed algorithm A and a given request sequence a, we denote 
by faulty (ct) the number of page faults that A incurs on cr, and we denote by 
cache /I (cr) the cost c{x) where x is the final cache size of A on cr. Moreover, 
we define cost^ (cr) = fault ,4(0-) + caches (ct). An online algorithm A is called 
R-competitive if there is a fixed constant b such that for all request sequences cr 
we have 

cost^ (cr) < i? • cost OPT (cr) + 6. (1) 

The smallest R for which an online algorithm is i?-competitive is its competitive 
ratio. The optimal offline algorithm will be denoted by OPT. 

Organization of the paper. In Section 0we summarize and explain our three main 
results. The proof of the result on linear cost functions can be found in Sections 01 
(proof of the positive result) and0 (proof of the negative result), respectively. The 
results on polynomial cost functions are proved in Section 01 The characterization 
of cost functions that allow finite competitive online algorithms is proved in 
Section 0 Finally, we end with a short conclusion in Section 0 



2 Our Results 

We start by discussing several ‘natural’ cost functions c{x). In the simplest case 
the cost of the cache is proportional to its size and each cache slot costs much 
more than a single page fault. 
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Theorem 1. Assume that the cache cost is c(x) = ax where a is some positive 
real. Let A ~ 3.14619 be the largest real solution of 

X = 2 + In A. (2) 

Then there exists an online algorithm for the cache purchase problem with com- 
petitive ratio A. For any r < X there exists an a such that no online algorithm 
can be r- competitive. 

The equation (|2I) has two real solutions, where the smaller one is approximately 
0.15859 and where the larger one is A ~ 3.14619. For users of MAPLE we remark 
that A = — W{—e~‘^) where W{-) is the well known Lambert W function [S|. 

We also note that in the (easier, but somewhat unreasonable) case where 
cache slots are not much more expensive than a single fault (i.e., where a is close 
to 1), an online algorithm can be (1 + a)-competitive by simply purchasing a 
cache location for each new item requested and never evicting any item. Besides 
linear cost functions, another fairly natural special case considers polynomial 
cost functions. 

Theorem 2. Assume that the cache cost is c(x) = x‘^ with d > 2. Then there 
exists an online algorithm for the cache purchase problem whose competitive 
ratio grows like d/lnd o{d/ In d). Moreover, no online algorithm can reach a 
competitive ratio that is better than d/lnd. 

Finally, we will exactly characterize the class of cost functions c for which 
there exist online algorithms with finite competitive ratios. 

Theorem 3. There exists an online algorithm with finite competitive ratio for 
online paging with cache purchasing cost c{x) if and only if 

3(7 > 1 > 0 3s > 0 3X > 0 \/x > X : c(qx) < p ■ c{x) + s. (3) 

One way to interpret the condition described above is that the cost function 
c(x) must be polynomially bounded. 



3 The Positive Result for Linear Cost Functions 

In this section we prove the positive result claimed in Theorem 01 for linear cost 
functions of the form c(x) = ax with (a > 0. We consider the online algorithm 
BETA that uses LRU as its paging strategy, and that increases its cache size to 
£ as soon as la,nlt beta > a(3{^ — 1) holds. Here the critical parameter /? equals 
1/lnA where A was defined in equation J3). Then j3 ~ 0.872455, and it can be 
checked that /3 is the unique positive root of 2 + 1//3 = 

Lemma 1. //A is defined as in and /3 = 1/lnA, then any real y >Q satisfies 
the inequality 



(/3+1)2/ < A(l + /3j/-/3-/31ny). 
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Proof. First note that by the definition of f3 we have = 1//3. Next 

we will apply the well known inequality z + 1 < for real numbers z. Setting 
z = y ■ — 1 in this inequality, taking logarithms, and applying some 

algebra yields 



In y < In 



(3\ 

/3A - /? - 1 



/3A - /3 - 1 ^ 

^ I3\ 



1 /3A - /3 - 1 

— — 1 “ 'll • 

/3 ^ /3A 



- 1 . 



It is easily verified that this inequality leads to the claimed inequality. □ 



Now consider an arbitrary request sequence a. Let k denote the final cache 
size of BETA when fed with cr, and let ^ denote the optimal offline cache size for 
a. We denote by fi the total number of page faults that BETA incurs up to the 
moment when it purchases the ith cache slot. Then between buying the (i+ l)th 
and the zth slot, BETA has /i+i — fi = a(3 faults. 

Lemma 2. Eor a sequence a on which BETA purchases a cache of size k and 
OPT uses a cache of size £ 



fc-i 

fault opt{o') 

i—i 




Proof, li £ > k then the lemma is trivial. 

For £< k, we prove the lemma by modifying the standard technique of 
dividing the request sequence into phases. The first phase will begin with the 
first request. If at some point i is the current size of BETA’S cache, i distinct 
items have been requested during the current phase, and an item distinct from 
the i items requested during this phase is the next request then end the current 
phase and begin the next phase with the next request. 

Consider a given phase which ends with BETA’S cache size equal to i. Since 
BETA uses LRU as its paging strategy and because of the way a phase is ended, 
during this phase BETA will fault on any item at most once. On the other hand 
between the second request in this phase and the first request of the next phase 
the optimal algorithm must fault at least i — £ + 1 times. If the size of BETA’S 
cache was also i at the beginning of the phase then this phase contributes exactly 
what is needed to the sum. If however BETA’S cache size was smaller then this 
phase contributes (slightly) more than is necessary to the sum. □ 

We now prove the positive result in Theorem 0 



Proof. First assume that k < £. Since offline purchases £ cache slots the offline 
cost is at least a£. Since BETA did not purchase the {k + l)-th cache slot, at 
the end fault b£) 7 vi(o') < a(3k and cache beta{ct) = ak. Since /3 + 1 < A, the cost 
of BETA is at most a factor of A above the optimal offline cost. 
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Now assume that k > i. Using the previous lemma: 

costopT(o’) = cacheopT(o') + faultopT(o’) 

k — 1 ^ 

> — fi){i — £ + 1)- 

i=t * 

k — 1 ^ 

> ai aj3 ii — €}- 

i—i 

f h 

> ai + aP (k — i — iln — 

Since the online cache cost is ak and the online fault cost is at most a/3fc, by 
substituting y = k/i > 1 we have 

cost be TA{<y)/ cost OP t{<^) < «(/? + l)fc / (^cd + a/3 f — fin j 

= (/3+ l)y / (l + /3y-/3-/31ny) < A. 

Here we used Lemma Q to get the final inequality. Therefore, algorithm BETA 
indeed is a A-competitive online algorithm for paging with cache purchasing cost 
c(x) = ax. This completes the proof of the positive statement in Theorem^ □ 

4 The Negative Result for Linear Cost Functions 

In this section we prove the negative result claimed in Theorem 0for linear cost 
functions of the form c{x) = ax with a^ 1. The proof is done by an adversary 
argument. 

Assume that there is an online algorithm A which is r-competitive for some 
1 < r < A and that a is very large. Further, assume that the pages are numbered 
1,2,3,. . . . The adversary always requests the smallest numbered page which is 
not in the online cache. Thus, the online algorithm faults on every request. Let 
cr be the infinite sequence of requests generated by this procedure. 

In order to be finitely competitive, the online algorithm cannot have any 
fixed upper bound on the size of its cache; hence, the number of purchased slots 
is unbounded. Let fi be the number of requests (respectively, the number of page 
faults) which precede the purchase of the ith cache slot. Note that /i = 0, and 
that /i, / 2 , ... is a monotone non-decreasing integer sequence. The cost to A for 
the requests I, fi equals ia + fi. 

We will now consider the adversary’s cost for requests ai = 1, . . . , /p Guided 
by the results of Sleator & Tarjan [3 (see Proposition Pi we upper bound the 
adversary’s cost in the following lemma: 

Lemma 3. If A purchases the jth cache slot after fj faults and OPT uses a 
cache of size i then 

^ ^ ^ ~\~ 1 
faultopT(o-i) < i + '^{fj+i - fj)- ^ . 
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Proof. Any algorithm will fault the first time a particular page is requested. 
These i faults correspond to the first term in the right hand side of the above 
inequality. Below we will explicitly exclude these faults from consideration. 

As in Lemma |3 we divide the request sequence into phases. However, this 
time we do this in a slightly different way. The first phase begins with the first 
request. If j is the size of A’s cache and j requests have been made during the 
current phase then the current phase ends and the next request begins a new 
phase. Note that this differs slightly from Lemma El since the first request in a 
phase need not be distinct from all requests in the previous phase. 

Consider a phase which begins with the size of A’s cache equal to j and ends 
with the size of A’s cache equal to j' . During this phase there will be j' requests 
from items 1, 2, ...,/ + 1. Note that none of the items labeled j + 2, ...,/ + 1 
will have been requested in any previous phase, since the size of A’s cache prior 
to this phase was j. Recall that the first fault on each of these j' — j items has 
already been counted. 

From this point we proceed in a fairly standard manner. We claim that an 
optimal offline algorithm will incur at most j' — £ + 1 faults. Note that this is 
equivalent to stating that the optimal offline algorithm will not fault on ^ — 1 
requests. 

If possible, whenever the offline algorithm faults it evicts an item which will 
not be requested in the remainder of this phase. If at some point it is not possible 
for the offline algorithm to evict such an item then all £ items in its cache will 
be requested later in this phase. In this case, it is easy to see that there will 
be at least £ —\ requests on which the offline algorithm will not fault. On the 
other hand, if the offline algorithm is able to evict j' — £ + 1 items which will 
not be requested later in this phase then its cache contains all of the (at most £ 
distinct) items which will be requested during the remainder of the phase. 

Of the j' — £+1 faults which the offline algorithm will incur during this phase, 
/ — j faults correspond to the first time an item is requested (these j' — j faults 
have already been counted) . So this phase will contribute j — £+1 faults to the 
sum. If j' = j then this phase contributes precisely what is claimed. If instead 
j' > j this phase contributes (slightly) less. □ 

We now prove the lower bound of A on any online algorithm. 



Proof. Fix an online algorithm A. For a given a, if fi/i is not bounded above 
then A cannot have a constant competitive ratio. Clearly, fi/i is bounded be- 
low by 0 (for i > 1). So L = liminfi ^ exists. Suppose that the adversary 
initially purchases i/X cache locations and serves ai with only these locations. 
From the definition of L, we know that for any e there are arbitrarily large 
M such that fi/i > L — e for all i > M/A. Further for sufficiently large M, 






< e. Using the previous lemma we get 



cost^ ~ ^ ~ ^ 

cost OPT - aM/A + A/ + E,"lM/A(/Mi- fj) 



\ J — M/ A+1 



J 
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OlM j A + M + /yf 



aM + /m — b 

M-M/X y^M-1 

M-1 + 2^j=M/X+l 



f l-M/X 

i(l-i) 



/m/a 



M 



^ a + L — &/M — e 

- a/A + 1 + (L + 2e)(l - 1/A - l/A(ln(A) - e) + (ln(A) + e)/M) 



a + L — b/M — e 

ct/A+l + (i + 2e) /A + (-^ + 2e) (ln(A) + e)/ M 

In the final inequality we have used that 1/A = 1 — 1/A — l/Aln(A). For a and 
M sufficiently large, the final value can be made arbitrarily close to A. □ 



5 The Results for Polynomial Cost Functions 

In this section we prove the two results (one positive and one negative) that are 
claimed in Theorem |2| for cost functions of the form c(x) = x‘^ with d>2. 

We start with the proof of the positive result. 



Proof. Let e > 0 be a small real number. Similar to Section 0 we consider an 
online algorithm BETA 2 that uses LRU as its paging strategy. This time the 
online algorithm increases its cache size to k as soon as it has incurred at least 
d’^{k — page faults. We will show that this algorithm has a competitive ratio 
of at most d(l + — e) Ind]. 

Consider an arbitrary request sequence a. Let k denote the final cache size 
of BETA 2, and let i denote the optimal offline cache size for ct. If € > fc + 1, the 
offline cost is at least (fc+l)'^ and the online cost is at most k'^ + d'^ {k + 1)‘^ . Then 
the online cost is at most a factor of 1 + above the offline cost. From now on 
we assume that £ < fc. For i = £,... ,k we denote by fi the total number of page 
faults that BETA2 incurs until the moment when it purchases the ith cache slot. 
Using an argument similar to that of Lemma 0 we get that the optimal offline 
algorithm incurs at least X)i=/(/*+i ~ ^ + 1)? page faults during this 

time. Therefore, 



fc-i 



costopr(CT) > £"* + - fi) 



i-e+1 



k-1 






k-1 



= e^ + d^k"^ - i^) - dH^{{i + - i'^)- 



i—i 



> + d^(k^ -t^)-dH [ 

Jf. 






dx 
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fll+e 

= _ £ (yfed-l _ ld-l^ 

a — 1 

The online cache cost is approximately costBBTA 2 (o’) ~ {l + d’^)k'^. Substituting 
y ■= ijk <\ and putting things together we get 

+ (-I) 

The denominator in the right hand side of (0) is minimized for y = 
and hence is at least 

^(e-i)d/(d-i) + (^1 _ rfb-i)/(d-i)^ > d" (l - db-i)/(rf-i)^ 

By applying some (tedious) calculus we get that as d tends to infinity, the func- 
tion 1 — grows like (1 — e) ^ ■ By combining these observations with 

the inequality in we conclude that the competitive ratio R of BETA2 is 
bounded by 

(l + d-^)d 
“ ( 1 — e) In d ’ 

This completes the proof of the positive statement in Theorem El □ 

We turn to the proof of the negative statement in Theorem El which is done by 
an adversary argument. 

Proof. Consider an r-competitive online algorithm for cost functions of the form 
c(x) = x'^. The pages are numbered 1,2,3,. . . , and the adversary always requests 
the smallest numbered page which is not in the online cache. Thus, the online 
algorithm faults on every request. In order to have a finite competitive ratio, the 
online algorithm cannot run forever with the same number of slots. Hence, the 
number of purchased slots must eventually exceed gr where g is a huge integer. 
Suppose that after ig requests the online cache is extended for the first time to 
a size k > qr. Then the corresponding online cost at this moment is at least 

ig + {grY- 

Now consider the optimal offline algorithm for cache size gr — g + 1. Since 
the online algorithm was serving the first ig requests with a cache size of at 
most gr — 1, the results of Sleator & Tarjan 0 (see Proposition^) yield that the 
number of offline faults is at most 

, {gr -1) - {gr - g+1) + 1 _ ig{g - 1) ^ ig 

^9 ’ 1 — 1 — ’ 

gr — 1 gr — 1 r 

The offline cache cost is {gr — g+lY. Since the online algorithm is r-competitive, 
there exists a constant b such that the following inequality is fulfilled for all 
integers g; cf. equation dD: 

ig + {gr)'^ < r ■ + {gr - g + + b. 
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Since g can be arbitrarily large, this implies < r(r 
to 




1)^^ which is equivalent 



( 5 ) 



Now suppose that r < d/lnd. Then the left hand side in (0) is at least Ind/d, 
whereas the right hand side is at most 1/d. This contradiction completes the 
proof of the negative statement in Theorem 0 □ 



6 The Results for the General Case 

In this section we prove the two results (one positive and one negative) that are 
claimed in Theorem 0 

We start with the proof of the positive result. 



Proof. So we assume that the cost function c{x) satisfies condition Q. Fix a 
request sequence cr. We may assume that the optimal offline algorithm for cr 
uses a cache of size xqpt > X- The case when xqpt < X can be disregarded 
by making b in the definition of competitiveness sufficiently large; in fact, any b 
greater than c{X) will do. 

Consider the algorithm BAL which uses LR U as its paging strategy and which 
tries to balance its cache cost and its fault cost. In other words, BAL increases its 
cache size to x as soon as fault > c{x). Until the time where BAL purchases 
a cache of size q ■ xqpt, the cost ratio of online to offline is at most 2p: At this 
time cachesAL = c(qxoPT) < P‘ c{xopt) and faultsyiL ~ cachesAL^ whereas 
the offline cost is at least c{xopt)- From the time where BAL purchases a cache 
of size q ■ xqpt onwards, the ratio is at most 2q/{q — 1): By using the result of 
Sleator & Tarjan as stated in Proposition Q] with i = xqpt and k = q ■ xqpt, 
we get that faultB,4B/faultop'r < qxopr/iqxoPT — xqpt + !)• Therefore, 



cost bal 



2 fault sal < 



2 g • XQPT ■ fault OPT 
qxopT — Xqpt + 1 



< 



2g 

q-1 



cost OPT ■ 



To summarize, we have shown that BAL is max{2p, 2g/(g — l)}-competitive. 
This completes the argument for the positive result. □ 



Now let us turn to the negative result. 



Proof. So we assume that the cost function c{x) does not satisfy the condition 
( 0 , and that therefore 

Vg > 1 Vp > 0 Vs > 0 VA > 0 > A : c{qx) > p ■ c{x) + s. (6) 

The idea is quite simple. If OPT uses a cache of size x then an online algorithm 
which wants to be R competitive must eventually purchase a cache of size px for 
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some p ~ R/{R— 1). The result of Sleator and Tarjan as stated in Proposition ^ 
requires that R ■ fault opt — fault cannot be too large. On the other hand, 
Rc{x) — c{px) can be made arbitrarily large, since c is not polynomially bounded. 

Now for the details. We will proceed by contradiction. Assume that there 
is an algorithm A which is i?-competitive for some R > 1 and fix b as in the 
definition of competitiveness. By using m we can choose x to satisfy 

/ 2R-1\ 

^\2R — 2/ ^ R- c{x) + x{2R— l)/2 + R+ b. 

If we use the lower bound sequence from Proposition Cl for k = x{2R — 1)/{2R — 
2) — 1 and i = x until A purchases a cache of size x{2R — l)/(2i? — 2), then 
i?-faultppL((r) — fault^(cr) < x{2R—l)/2. Note that A must eventually purchase 
a cache of this size, since otherwise cost^ will tend to oo while costoPT < 
c((2R - l)x/(2R - 2)) + (2R - l)x/(2R - 2). Therefore, 

cost^((r) = caches (o’) +fault^(o-) 

> R ■ cacheBBL(o’) + R ■ faults£;i(cr) + b 
= R ■ cost OPT (o’) + b. 

This contradiction completes the proof of the negative result in Theorem 01 □ 

7 Conclusion 

We have considered a simple model of caching which integrates the ability to 
add additional cache locations. A number of results for linear, polynomial, and 
arbitrary cost functions have been found. One possible direction for further study 
of this problem is to consider the degree to which randomization can further 
reduce the competitive ratios found in this paper. The primary difficulty when 
attacking the randomized version of the problem is finding relationships between 
the costs of the online and offline algorithms when the cache sizes are unequal. 
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Abstract. A minimal perfect hash function for a set S is an injective 
mapping from S'to{0,...,|S| — 1}. Taking as our model of computation 
a unit-cost RAM with a word length of w bits, we consider the problem 
of constructing minimal perfect hash functions with constant evaluation 
time for arbitrary subsets of 1/ = {0, . . . , 2“ — 1}. Pagh recently 
described a simple randomized algorithm that, given a set S' C (7 of 
size n, works in 0{n) expected time and computes a minimal perfect 
hash function for S whose representation, besides a constant number of 
words, is a table of at most (2 -|- e)n integers in the range {0, . . . , n — 1}, 
for arbitrary fixed e > 0. Extending his method, we show how to replace 
the factor of 2 -|- e by 1 -I- e. 

Keywords: Data structures, randomized algorithms, dictionaries, hash- 
ing, hash tables, minimal perfect hash functions, space requirements. 



1 Introduction 

A minimal perfect hash function for a set S is an injective mapping from S to 
{0, . . . , IS”! — 1}. Given a minimal perfect hash function h for the set of keys of 
a collection 72. of n records, we can store 72 efficiently to enable the retrieval of 
records by their keys. If a record with key x is stored in B[h{x)] for some array B 
of size n, the only additional space needed is that required for the representation 
of h, and the time needed to access a record in 72 is essentially the evaluation 
time of h. 

Taking as our model of computation the unit-cost RAM with a word length 
of w bits, for some fixed positive integer w, we study the construction of 
minimal perfect hash functions for arbitrary given subsets S of the universe 
U ={0,...,2“' — l}of keys representable in single computer words. We will 
only be interested in hash functions that can be evaluated in constant time. The 
remaining parameters of interest are the time needed to find a minimal perfect 
hash function for a given set S and the space needed to store it. 

Tarjan and Yao H3 investigated the compression of a sparsely used two- 
dimensional table A and considered a class of displace- and-project functions that 
shift each row of A horizontally by an amount associated with the row and called 
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its displacement and then project the shifted rows vertically to a one-dimensional 
table B. We call displacements for a subset of the rows of A compatible if no two 
used entries in the shifted rows under consideration reside in the same column. 
In order for the compression from A to B to work as intended, the displacements 
of all the rows must be compatible. 

Define the weight of a row in A as the number of used entries in that row. 
Tarjan and Yao showed that if A contains n used entries and satisfies a cer- 
tain harmonic- deeay property, then compatible row displacements in the range 
{0, . . . ,n — 1} can be found by a simple first-fit-decreasing (FFD) algorithm 
that processes the rows one by one in an order of nonincreasing weight and, 
for each row, chooses the smallest nonnegative displacement compatible with all 
previously chosen row displacements. They also demonstrated that the harmonic- 
decay property can be enforced by a preprocessing phase that shifts each column 
of A vertically by a suitable column displacement. 

Pagh m observed that for a table A with n columns, the row shifts of 
Tarjan and Yao can be replaced by cyclic shifts (he considered a more general 
operation of no concern here) . More significantly, he introduced a new approach 
to enforcing the harmonic-decay property. In his setting, the table entries in A are 
not the original universe. Instead, the column and row indices of a used entry in 
A are the values obtained by applying suitable functions / and g to a, key in a set 
S with jS”! = n for which a minimal perfect hash function is to be constructed. 
Pagh identified a set of conditions concerning / and g, one of which is that 
(/, g) is injective on S, and showed that if these conditions are satisfied, then a 
minimal perfect hash function for S can be computed in 0{n) expected time by 
a random- fit- decreasing {RFD) algorithm that operates like the FFD algorithm, 
except that the displacements tried for each row of weight at least 2 are chosen 
randomly, rather than in the order 0 , 1 ,..., until a displacement compatible 
with the previously chosen displacements is encountered. Complementing this, 
Pagh showed that if / and g are chosen at random from suitable classes of 
hash functions and A has m > {2 -\- e)n rows, for some fixed e > 0, then the 
conditions required by the analysis of the RFD algorithm hold with a probability 
bounded from below by a positive constant. Altogether, this yields a construction 
in 0(n) expected time of a simple minimal perfect hash function for S that can 
be evaluated in constant time and whose representation consists of a constant 
number of words (to specify / and g) together with the m row displacements. If 
the chosen displacements are do,. . . ,dm-i, the hash value of a key x is simply 
{f{x) -k dg(x)) mod n. 

Taking Pagh’s construction as our starting point, we reduce the minimal 
number of row displacements from (2 -|- e)n to (1 -I- e)n, for arbitrary fixed e > 
0, thus essentially halving the space requirements of the minimal perfect hash 
function. In order to achieve this, we introduce a new algorithm for computing 
row displacements, the Undo-One algorithm. Except for the initial sorting of 
the rows by nonincreasing weight, the FFD and RFD algorithms are both online 
in the sense that they choose each displacement taking into account only rows 
whose displacements were already chosen. Informally, a displacement, once fixed. 
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is never again changed. The Undo-One algorithm deviates slightly from this 
principle by allowing the computation of each displacement to change a single 
displacement that was chosen earlier. Informally, if it is difficult to place the row 
at hand, it is permissible to make room for it by relocating a single previously 
placed row. This added flexibility allows the Undo-One algorithm to cope with 
a higher level of crowding in the array A, which translates into a smaller value 
of m. 

Concretely, we formulate new conditions concerning / and g, more permis- 
sive than those of Pagh, and show that these conditions enable the Undo-One 
algorithm to succeed in 0{n) expected time. On the other hand, we show that 
functions / and g that satisfy the new conditions can be found in 0(n) expected 
time as long as m > (1 -I- e)n, for arbitrary fixed e > 0. Our proof of this makes 
heavier demands on the class of hash functions from which g is drawn than does 
Pagh’s analysis — the values under g should behave approximately as if they were 
uniformly distributed and independent random variables. We show, extending 
results of |2| slightly, that a standard class of hash functions — remainders of 
polynomials of sufficiently high degree over a finite field — behaves sufficiently 
randomly. 

Both parts of our argument, analyzing the Undo-One algorithm and estab- 
lishing the required properties of / and g, are more involved than the corre- 
sponding parts of Pagh’s analysis. The form of the resulting minimal perfect 
hash function, however, x i— >■ (f{x) + mod n, is exactly the same as in 

Pagh’s construction if n is odd, except that g must now be chosen from a class 
of hash functions that meets additional requirements. 

A comprehensive discussion of related work can be found in a survey by 
Czech et al. j2|. In a seminal paper, Fredman et al. ^ described a randomized 
construction of a static dictionary with constant access time for a given set of 
n keys. The construction works in 0{n) expected time and is easily modified to 
yield a minimal perfect hash function. The space requirements are 0(1) words 
of w bits plus 0{n) words of O(logn) bits, for a total of 0{w -I- nlogn) bits. 
Fredman and Komlos |3 showed that the minimal number of bits needed to 
represent minimal perfect hash functions is nlog 2 e -I- log 2 W — O(logn). Schmidt 
and Siegel Cl proved that a constant access time can be achieved together 
with space requirements of 0(n -|- logw), but they did not describe an efficient 
construction algorithm. Hagerup and Tholey closed this gap by exhibiting a 
randomized construction that works in 0(n -|- logic) expected time, and they 
also reduced the space requirements further to nlog 2 e -I- log 2 ic -I- o{n + logic), 
i.e., to within lower-order terms of the optimum. 

The work of Pagh cited above as well as other efforts in the area of minimal 
perfect hashing do not emphasize the space minimization quite so heavily, but 
rather try to keep the number of probes into the data structure as small as 
possible. Pagh’s algorithm, as well as the new algorithm described here, needs 
one — input-independent — access to the description of / and g and one access to 
the table of displacements. In a different line of research, building upon a long 
series of earlier work, Czech et al. P and Havas et al. 0 studied the construction 



112 M. Dietzfelbinger and T. Hagerup 



of minimal perfect hash functions of the form h{x) = (X^i=i ^gi(x)) mod n, where 
f is a constant, gi maps U to {0, . . . , m — 1}, for i = 1, . . . ,t, and do, , dm-i 
are suitably chosen values in {0,...,n — 1}. Storing the description of such 
a function essentially amounts to storing dp, , dm-i- Assuming an idealized 
situation in which g\, . . . ,gt behave fully randomly on S, the authors utilized 
results on threshold values for random (hyper)graphs to be acyclic to show that 
771 = (2+e)n is a suitable choice for t = 2, and they demonstrated experimentally 
that m Ki 1.23n is one for t = 3. In the latter case, evaluating h requires three 
accesses to the table containing do, ... , dm-i- Similarly, functions of the form 
Hx) = if{x) + + dg 2 (a;)) mod 77, where / : U {0, . . . ,n - 1}, gi : 

U — >■ {0, . . . , m/2 — 1}, and g^ ■ U ^ {m/2 , . . . ,m — 1}, were proposed by Fox 
et al. ^ and stated to work well (meaning successful construction of minimal 
perfect hash functions in 0(n) expected time) for m as small as 0.6 ti. However, 
only experimental evidence and no formal arguments were offered in support of 
this claim. 



2 The Undo-One Algorithm 

Let n and m be positive integers and suppose that S' is a set of size n and 
that / and g are functions mapping S to (0, . . . , ti — 1} and to (0, . . . , tu — 1}, 
respectively, so that (/, g) is injective on S. In terms of the informal description of 
the previous section, f{x) and g{x) are the column and row indices, respectively, 
of the cell in A to which x € S is mapped by (f,g). For 0 < j < m, let Rj = 
{/(x) : X € S and g{x) = j}. Informally, Rj is the set of used positions in row j 
of A. Correspondingly, we call Rj a row set. Given a row set R and an integer d, 
we define i?©d to be the set {(r + d) mod n : r G R} obtained by “rotating R by 
d positions to the right” . The goal of the Undo-One algorithm (and of the RFD 
algorithm) is to compute integers do, ... , dm-i in the range (0, . . . , ti— 1}, called 
(row) displacements, for which the sets Ro (B do, . . . , Rm-i © dm-i are disjoint. 

For 0 < j < m and 0 < d < n, by “placing Rj at (displacement) d” we will 
mean fixing the value of dj to be d. At a time when exactly the row sets Rj with 
indices j in some set J have been placed, we call V = [Jj^j{Rj © dj) the set of 
occupied positions and W = (0, . . . , ti — 1} \ U the set of free positions. Two row 
sets R and R', placed at d and d', collide if (i? © d) 0 {R' © d') 0. A row set 

R will be called large if |i?| > 3, and it is a pair if |i?| =2 and a singleton if 
1^1 = 1 - 

The Undo-One algorithm begins by computing a permutation a of {0, . . . , 
771-1} with |i?^( 0 )| > \Ra{i)\ > ■•• > \Ra(m-i)\ and then places for 

I = 0, . . . , 771 — 1. Observe than once all larger row sets have been placed, it is 
trivial to place the singletons in linear time. This is because, at that time, the 
number of free positions exactly equals the number of singletons. For this reason, 
we need not describe how to place singletons (nor how to place empty row sets) . 

Large row sets are simply placed according to the RFD algorithm. When 
attempting to place a pair P, the Undo-One algorithm works harder than for 
large row sets. It first tries to place P using a random displacement, just as 
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the RFD algorithm. If P collides with exactly one previously placed row set 
R, however, the algorithm does not give up right away, but instead tries to 
make room for P at the displacement chosen for it by relocating i? to a random 
displacement. Only if this also fails, a new attempt to place P is started. It turns 
out to be essential for the analysis that the alternative displacement used for the 
tentative relocation of R is not chosen according to the uniform distribution over 
{0, . . . ,n — 1}. If the old displacement of i? is c? and q = maxP — minP, the 
tentative new displacement is found as follows: With probability 1/2, a random 
value is chosen from the uniform distribution over {(d—q) mod n, (d+q) mod n}, 
and with probability 1 / 2 , a random value is chosen from the uniform distribution 
over { 0 , . . . , n — 1 }. 

3 Analysis of the Undo-One Algorithm 

We will analyze the Undo-One algorithm assuming n to be odd. The analysis 
centers on the distribution of row-set sizes. For fc = 0, 1, . . . , let Tfc = |{0 < j < 
m : |Pj| = fc}| be the number of row sets of size k. In this section, the following 
conditions will be assumed to hold for some v > 0: 



The function (/, 5 ) is injective on S'; 


(A) 


(1 + ^ < n; 

k>3 


(B) 


T\ + Tfe > (1 -|- v)n. 


(C) 



k>l 



Define a trial to be a single attempt by the Undo-One algorithm, as described 
in the previous section, to place a row set. We consider the placing of large row 
sets and the placing of pairs separately (as observed in the previous section, we 
need not concern ourselves with the placing of smaller row sets). The analysis 
of the placing of large row sets is exactly as Pagh’s analysis, except that our 
Condition (B) pertains only to large row sets and therefore enables the placing 
of large row sets only. 

Lemma 1. For constant v, the Undo-One algorithm places all large row sets in 
0(n) expected time. 

Proof. Consider the placing of a fixed large row set R and let ko = |P| > 3. 
Each trial chooses d G {0, . . . , n — 1} uniformly at random and succeeds unless 
(P©d)nU yf 0, where V is the set of occupied positions. The latter happens with 
probability at most ko\V\/n. Since |P'| > fco for all previously placed rows sets 
P', we have \V\ < J2k>ko ko\V\/n < J2k>k„ k'^Tk/n. By Condition (B) 

and because ko > 3, the latter quantity is bounded by < 1. For 

constant v, the expected number of trials to place R is therefore constant. Each 
trial can be carried out in O{ko) time, so that R is placed in O(fco) expected 
time. Over all large row sets, this sums to 0(n) expected time. 
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From now on we consider the placing of a fixed pair P and define V and W 
as the set of free and occupied positions, respectively, before the placing of P. 

Lemma 2. A trial can he executed in constant expected time. 

Proof. Associate a random variable K with a trial in the following way: If P 
collides with a single row set R, then let K = |i?|; otherwise take K = 1. If 
we maintain for each position in {0, ...,n — 1} an indication of the row set 
by which it is occupied, if any, it is easy to execute a trial in 0{K) time. For 
k > 2, Pr(AT = k) < \P\kTk/n. Therefore, by Condition (B), E{K) = 0{l + 
{^ln)T.k>3k^n)=0{l). 

Let q = max P — min P. We will say that two elements v and w' of {0, . . . , n — 
1} are q-spaced if v' = {v + q) mod n or v' = {v — q) mod n, i.e., if v and v' are 
at a “cyclic distance” of q in the additive group Z„. Let G = {V A W,E) be 
the undirected bipartite graph on the vertex sets V and W that contains an 
edge {w,u'} between v €V and v' &W precisely if v and v' are g-spaced. The 
maximum degree of G is bounded by 2. We will call the vertices in V and W 
black and white, respectively. 

Lemma 3. G is acyclic. 

Proof. Assume that G contains a simple cycle of length 1. Then I is the order 
of the element q in the additive group Z„. By Lagrange’s theorem, I divides 
|Z„| = n, which is odd by assumption. On the other hand, I is even because G 
is bipartite, a contradiction. 



Lemma 4. For constant v, the expected number of trials is constant. 

Proof. It suffices to bound the success probability of a trial from below by a pos- 
itive constant. We will assume without loss of generality that n < 1/2. Observe 
first that since X)fc>i Condition (C) implies that |IF| > Ti > vn. We 

consider two cases, depending on the number of edges in E. 

Case 1: \E\ < 2(1 — Jz)|IF|. Let E' = {{?;,u'} : v,v' € W and v and v' 
are g-spaced} and suppose that we inserted the edges in E' in G. This would 
raise the degree of each white vertex to exactly 2; i.e., \E\ + 2\E'\ = 2\W\ 
and \E'\ = \W\ — \E\/2 > iy\W\ > v"^n. It is easy to see that there is a 1-1 
correspondence between E' and the set of displacements at which P can be 
placed without collisions and without relocating another row set. Therefore a 
trial succeeds with probability at least . 

Case 2: \E\ > 2(1 — Jz)|lF|. For each row set R placed at a displacement d, 
we call the subgraph of G spanned by the edges incident on vertices in R(B d 
the row graph of R. A row graph with k black vertices can have at most 2k 
edges. It is called good if it has 2fc — 1 or 2k edges. In order to establish a 
lower bound on the number of good row graphs, let us imagine that we remove 




Simple Minimal Perfect Hashing in Less Space 115 



min{2/c — 2,r} edges from each row graph with exactly k black vertices and r 
edges. The number of edges removed is at most ~ 2)Tfe, and therefore 

the number of remaining edges is larger than 

2(1 - iy)\W\ - Y,{2k - 2)n > 2((1 - r/)Ti - - l)Tfc) 

k>2 k>l 

(Q) 

= 2(Ti + ^Tk-^kTk-vTi'^ > 2((1 + iy)n - n - lyTi) 
k>l k>l 

= 2v{n - Ti) > 2v\V\ > v\E\ > 2v{l - v)\W\ > 

Each remaining edge belongs to a good row graph, and no row graph contains 
more than two of the remaining edges. Therefore the number of good row graphs 
is at least {iP' /2)n. 

Since G is acyclic and has maximum degree at most 2, every connected 
component of a row graph is a simple path. We call such a path perfect if its 
first and last vertices are white. It is easy to see that a good row graph can have 
at most one connected component that is not a perfect path. We call a row set 
perfect if its row graph is good and has at least one perfect path. Let us consider 
two subcases. 

Subcase 2.1: There are at least /4)n perfect row sets. It is easy to see that 
a perfect row set Rj can be rotated either left or right by q positions without 
colliding with any other row set. Moreover, this necessarily leaves two free q- 
spaced positions at the end of a former perfect path, in which P can be placed 
(see Fig. Q. A trial that attempts to place P in these two positions will succeed 
with probability at least 1/4, namely if the displacement used for the tentative 
relocation of Rj is chosen to be one particular value in {{dj — q) mod n, {dj + 
q) mod n}. Thus each trial succeeds with probability at least /IQ. 
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Fig. 1. A perfect row set R is rotated right by q positions to make room for P. 



Subcase 2.2: There are fewer than /A)n perfect row sets. Assume first that 
n > 2(4/i^^). We call a row set R regular if its row graph is a simple path with 
one black and one white end vertex. If the row graph of a row set R is good yet 
R is not perfect, then R is regular. Since the number of good row graphs is at 
least {iP /2)n, at least fiP /^n row sets are regular. The key observation is that 
every regular row set can be relocated to “mesh with” any other regular row set 
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that is at least as large, and that this leaves at least two free g-spaced positions, 
in which P can be placed (see Fig. This shows that a trial succeeds with 
probability at least 
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Fig. 2. Two regular row sets are “meshed” to make room for P. 



Assume finally that n < 2(4/j/^). Since the number of good row graphs is 
positive, at least one white vertex is not of degree 2. But then a trial succeeds 
with probability at least 1/n > \{y^ j^. 



4 Analysis of the Hash Functions 



In this section we show how to find functions / and g that satisfy Conditions 
(A)-(C). Both / and g will be picked from a family of the form 



'^p,s = 



t-1 



mod p j mod 



i=0 



: 0 < oo, . . . ,at_i < p 



where t is a positive integer, p > max S' is a prime, and s = n in the case of /, 
while s = m in the case of g. 

It is well-known that the unique-interpolation property of polynomials over 
arbitrary fields can be expressed as follows in our setting. 



Proposition 5. Let € S be distinct and choose ao,...,at_i inde- 

pendently from the uniform distribution over {0, . . . ,p — 1}. Then the random 
variables {J2l=o^iXj) mod p, for j = are independent and uniformly 

distributed over {0, ... ,p — 1}. 

We consider the random experiment of obtaining a function h G "H* ^ by 
choosing the coefficients oq, . . . ,at_i independently from the uniform distribu- 
tion over {0, . . . ,p — 1}. Using Proposition!^ it is easy to see that for arbitrary 
X G S and j G {0, . . . , s — 1}, Pr(/i(a:) = j) = tTsj, where 



'^S,j 



l/p-|"p/s], if j < p mod s; 
1/p • [p/sj , if J > p mod s. 
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For every integer i, denote by (f-) the set of all subsets of S of size i. Given a 
subset X of S, let 5{X) be the random variable that takes the value 1 if h is 
constant on X, and 0 otherwise. By another application of Proposition 0 for 
arbitrary is { 1 , . . . , t} and X G (‘®), 

E{5{X)) = 

where = J2j=o^l,j- Since J2j=o'^s,j = !> Jensen’s inequality shows that 
and since tTsj < {l/p){l+p/s) = (l/s)(l + s/p) for j = 0 , . . . , s - 1 , 
we also have ps,i < s^“*(l + s/_p)L For every fixed t, we can choose p sufficiently 
large to make c = (1 + mjp'f arbitrarily close to 1 . 

We next prove that if g has been chosen to satisfy Conditions (B) and (C) 
(which depend only on g), then in 0(n) additional expected time, we can find a 
function / so that / and g together satisfy Condition (A). The procedure is to 
choose / repeatedly from ^ until Condition (A) holds. For the analysis, note 
first that by virtue of Condition (C), 

2 Ti + T 2 + 'y ( Tk > n = Ti + 2 T 2 + y ( kTk, 

k>3 k>3 

from which we can conclude that T± > T 2 and, subsequently, that T 2 < n/3. 
With bj = |{x G ^ : g{x) = j}\, for j = 0, . . . ,m — 1, the expected number of 
collisions under (/, 5 ) (i.e., of pairs {x,y} C S with f{x) = f{y) and g{x) = g{y)) 
is 




where Condition (B) was used in the second-to-last step. Provided that p is 
chosen sufficiently large to make c < 6/5, the expected number of collisions is 
therefore less than one. By Markov’s inequality, this means that the expected 
number of trials needed to satisfy Condition (A) is bounded by a constant. 

We now turn to Conditions (B) and (C). Take bj = |{cc G S' : g{x) = j}| as 
above, for j = 0 , ... ,m — 1, and define Ci = Ci) = Sxe(^) 

i = 0, 1, . . . The following relation between the quantities Tk and Ci is obtained 
from the “classical” Bonferroni inequalities 0 Inequality 12] by summation over 
all j G {0, . . . , TO — 1}. 

Proposition 6. For all integers k,l > 0, 



118 M. Dietzfelbinger and T. Hagerup 



We now find 



^ = Y. - 4^2 

k >3 k>l 

= E (2 (2) + (^/) ) ■ ■ "^^2 = 2C2 + Cl - Ti - 4T2. 

Continuing using Proposition El we obtain 

Y k^Tk < 2C2 + Cl - (Cl - 2C2 + 3C3 - 4C4 + 5C5 - 6Ce) 

k >3 

- 4(C2 - 3C3 + 6C4 - IOC5) = 9C3 - 20C4 + 35Cg + 6Ce. 
Consider repeated trials that pick g randomly from Hp ,^. For i = 1 , . . . ,t, 

E[Ci) = { . < ^ ■ cm =c-——n, 

\i J i\ i\ 

where a = n/m is the load factor. Moreover, for every fixed t, 

C(C.)>(l- 0 (i))-^n. 

The upper and lower bounds on E{Ci) show that for fixed t > 6, 

< {l + 0{l))-crut>{a), 

k >3 

where </>(a) = We are interested only in the range 

0 < a < 1, corresponding to m > n. It is easy to see that (/> is increasing in 
this range, so that 4 >{a) < (j){l) = Thus, provided that v is sufficiently small 
and p is sufficiently large to make ||(l + :z)c<l, the expected number of trials 
needed to satisfy Condition (B) is bounded by a constant for sufficiently large 
values of n. 

The analysis in the case of Condition (C) is similar. We first observe that 
Ti + X)fc>i Tk = Ti + {m — Tq). By Proposition El for all even I > 2 we have 

i 

Fi + ^Tfe>^(-l)*-i(* + l)C, 

k>l i=l 



and, ii t > I, 



e(t, 



k>l 2=1 
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Consider the infinite series 

oo . 

V^(a) = = e-“ + -(1 - e-“). 

i\ a 

2 = 1 

We have V’(l) = ^ind differentiation shows that ’ip'ia) < 0 for 0 < a < 1, so 
that V'(tt) > 1 for all a with 0 < a < 1. It follows that if ly is sufficiently small 
and n and t are sufficiently large, the expected number of trials needed to satisfy 
Condition (C) is bounded by a constant. 

We have shown how to satisfy Conditions (B) and (C) separately. However, 
we need to satisfy the two conditions simultaneously. We argue that this is possi- 
ble by showing that for i < tj2, the random quantity Ci is sharply concentrated 
around its mean, so that picking g at random from "H* „ for suitable p and t 
satisfies (B) and (C) not only with a probability that is bounded away from zero, 
but in fact with a probability that tends to 1 as n tends to infinity. By Cheby- 
shev’s inequality, it suffices to prove that Var(C'i) = 0{n). This fact, which we 
establish below, completes the overall argument. 

Theorem 7. Let n, s and t he positive integers, p a prime, i < tj2 a nonnegative 
integer and S a subset o/{0, . . . ,p—l} of size n such that a = nj s, i and ts/p are 
all hounded by constants. Suppose that h is drawn uniformly at random from Up s 
and define bj = \{x e S : h{x) = j}\, /or j = 0, . . . , s - 1, and Ci = Cl) ■ 

Then Var(C'i) = 0{n). 

Proof. Assume that i > 1, since the claim is obvious for i = 0. As argued near 
the beginning of this section, the value of E{S{X)), where X is a nonempty 
subset of S of size k <t, depends only on k] we denote this quantity by pk- Now 

Var(Ci) = E{{C, - E{Ci)f) = if(( ^ (<5(X) - /x*))') 

= Y. E{{5{X) - pi){5{Y) - Pi)) = Y {E{5{X)5{Y)) - pj) 

2i 

< Y E{S{XUY)) = Y Y 

X,Yg(®) X,YG(f) 

\XuY\=k 

The term-by-term validity of the last inequality above is immediate in the case 
of overlapping sets X and Y, for which <5(X)<5(y) = <5(A U Y). When X and Y 
are disjoint, on the other hand, i5(X) and i5(F) are independent, since 2i < t, so 
that the left-hand term is zero. Let i < k <2i. Since i is bounded by a constant, 
the number of pairs of subsets X, T € ('J) with |X U F| = fc is 0{n^). Moreover, 
Pk Cl where c = (1 -I- s/p)* < is bounded by a constant. Hence 

= O • ~Y^ = 0{n). 



( 2i N 

k=i / 
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If n is not odd, as required by our analysis in Section 0 and S U, we can choose 
an element x G U \ S and compute a minimal perfect hash function h for the 
key set S'Llja:}. Subsequently, by adding the same constant to all displacements, 
modulo n + 1, we “rotate” h to obtain a minimal perfect hash function h' for 
S U {x} with h'{x) = n. The function h' is also a minimal perfect hash function 
for S. We summarize our findings as follows. 

Theorem 8. For every fixed e > 0, there is an integer constant t > 1 such that 
for every integer w > 1, every subset S of {0, . . . , 2“ — 1} of size n > 1 and 
every given prime p > max(S' U {tn}) of 0{w) bits, we can, in 0{n) expected 
time on a unit-cost RAM with a word length ofw bits, compute a positive integer 
m < (1 + e)n, the coefficients of functions f G Hp „/ and g G "H* „j and integers 
do, , dm-i such that the function x i— >■ {f{x)-\-dg(^x)) mod n' maps S injectively 
to {0, . . . , n — 1}. Here n' = n if n is odd, and n' = n 1 if n is even. 
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Abstract. We present a simple and efficient dictionary with worst case 
constant lookup time, equaling the theoretical performance of the clas- 
sic dynamic perfect hashing scheme of Dietzfelbinger et al. The space 
usage is similar to that of binary search trees, i.e., three words per key 
on average. The practicality of the scheme is backed by extensive ex- 
periments and comparisons with known methods, showing it to be quite 
competitive also in the average case. 

1 Introduction 

The dictionary data structure is ubiquitous in computer science. A dictionary is 
used to maintain a set S under insertion and deletion of elements (referred to as 
keys) from a universe U. Membership queries (“x G S'?”) provide access to the 
data. In case of a positive answer the dictionary also provides a piece of satellite 
data that was associated with x when it was inserted. 

A large literature, briefly surveyed in Sect. o is devoted to practical and 
theoretical aspects of dictionaries. It is common to study the case where keys 
are bit strings in [/ = {0, 1}’" and w is the word length of the computer (for 
theoretical purposes modeled as a RAM). Section [^briefly discusses this restric- 
tion. It is usually, though not always, clear how to return associated information 
once membership has been determined. E.g., in all methods discussed in this 
paper, the associated information of x S S' can be stored together with x in a 
hash table. Therefore we disregard the time and space used to handle associated 
information and concentrate on the problem of maintaining S. In the following 
we let n denote |S|. 

The most efficient dictionaries, in theory and in practice, are based on hash- 
ing techniques. The main performance parameters are of course lookup time, 
update time, and space. In theory there is no trade-off between these. One can 
simultaneously achieve constant lookup time, expected amortized constant up- 
date time, and space within a constant factor of the information theoretical 
minimum of B — log bits |3|. In practice, however, the various constant 
factors are crucial for many applications. In particular, lookup time is a crit- 
ical parameter. It is well known that the expected time for all operations can 
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1999-14186 (ALCOM-FT). Work initiated while visiting Stanford University. 

** Basic Research in Computer Science (www.brics.dk), funded by the Danish National 
Research Foundation. 
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be made a factor (1 + e) from optimal (one universal hash function evaluation, 
one memory lookup) if space 0{n/e) is allowed. Therefore the challenge is to 
combine speed with a reasonable space usage. In particular, we only consider 
schemes using 0(n) words of space. 

The contribution of this paper is a new, simple hashing scheme called cuckoo 
hashing. A description and analysis of the scheme is given in Sect. |3, showing 
that it possesses the same theoretical properties as the dynamic dictionary of 
Dietzfelbinger et al. [7j. That is, it has worst case constant lookup time and 
amortized expected constant time for updates. A special feature of the lookup 
procedure is that (disregarding accesses to a small hash function description) 
there are just two memory accesses, which are independent and can be done in 
parallel if this is supported by the hardware. Our scheme works for space similar 
to that of binary search trees, i.e., three words per key in S on average. 

Using weaker hash functions than those required for our analysis, cuckoo 
hashing is very simple to implement. Section Eldescribes such an implementation, 
and reports on extensive experiments and comparisons with the most commonly 
used methods, having no worst case guarantee on lookup time. Our experiments 
show the scheme to be quite competitive, especially when the dictionary is small 
enough to fit in cache. We thus believe it to be attractive in practice, when a 
worst case guarantee on lookups is desired. 

1.1 Previous Work 

Hashing, first described by Dumey [3, emerged in the 1950s as a space efficient 
heuristic for fast retrieval of keys in sparse tables. Knuth surveys the most im- 
portant classical hashing methods in pi Sect. 6.4]. These methods also seem to 
prevail in practice. The most prominent, and the basis for our experiments in 
Sect. 2] are Chained Hashing (with separate chaining). Linear Probing and 
Double Hashing. We refer to P) Sect. 6.4] for a general description of these 
schemes, and detail our implementation in Sect.^ 

Theoretical Work. Early theoretical analysis of hashing schemes was typically 
done under the assumption that hash function values were uniformly random and 
independent. Precise analyses of the average and expected worst case behaviors 
of the abovementioned schemes have been made, see e.g. HH. We mention just 
that for Linear Probing and Double Hashing the expected longest probe 
sequence is of length l7(logn). In Double Hashing there is even no bound 
on the length of unsuccessful searches. For Chained Hashing the expected 
maximum chain length is 0(logn/loglogn). 

Though the results seem to agree with practice, the randomness assumptions 
used for the above analyses are questionable in applications. Carter and Weg- 
man ^ succeeded in removing such assumptions from the analysis of chained 
hashing, introducing the concept of universal hash function families. When im- 
plemented with a random function from Carter and Wegman’s universal family, 
chained hashing has constant expected time per dictionary operation (plus an 
amortized expected constant cost for resizing the table). 
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A dictionary with worst case constant lookup time was first obtained by 
Fredman, Komlos and Szemeredi m. though it was static, i.e., did not support 
updates. It was later augmented with insertions and deletions in amortized ex- 
pected constant time by Dietzfelbinger et al. [7j. Dietzfelbinger and Meyer auf 
der Heide 0 improved the update performance by exhibiting a dictionary in 
which operations are done in constant time with high probability, i.e., probabil- 
ity at least 1 — n~‘^, where c is any constant of our choice. A simpler dictionary 
with the same properties was later developed u When n = |f7|^ a space 
usage of 0(n) words is not within a constant factor of the information theoretical 
minimum. The dictionary of Brodnik and Munro |2| offers the same performance 
as jZ], using 0(B) bits in all cases. 

Experimental Work. Although the above results leave little to improve from a 
theoretical point of view, large constant factors and complicated implementation 
hinder direct practical use. For example, the “dynamic perfect hashing” scheme 
of 0 uses more than 35n words of memory. The authors of 0 refer to a more 
practical variant due to Wenzel that uses space comparable to that of binary 
search trees. According to ^21 the implementation of this variant in the LED A 
library (HI, described in EH, has average insertion time larger than that of 
AVL trees for n < 2^^, and more than four times slower than insertions in 
chained hashing. The experimental results listed in H3 Table 5.2] show a gap 
of more than a factor of 6 between the update performance of chained hashing 
and dynamic perfect hashing, and a factor of more than 2 for lookupfl 

Silverstein m explores ways of improving space as well as time of the dy- 
namic perfect hashing scheme of |7| , improving both the observed time and space 
by a factor of roughly three. Still, the improved scheme needs 2 to 3 times more 
space than linear probing to achieve similar time per operation. It should be 
noted that emphasis in m is very much on space efficiency. For example, the 
hash tables of both methods are stored in a packed representation, presumably 
slowing down linear probing considerably. 

A survey of experimental work on dictionaries that do not have worst case 
constant lookup time is beyond the scope of this paper. However, we do remark 
that Knuth’s selection of algorithms seems to be in agreement with current 
practice for implementation of general purpose dictionaries. In particular, the 
excellent cache usage of Linear Probing makes it a prime choice on modern 
architectures. 

2 Preliminaries 

Our algorithm uses hash functions from a universal family. 

Definition 1. A family {/lijig/, : U ^ R, is (c, fc)-universal if, for any k 
distinct elements Xi,...,Xk S U, any yi,...,yk S R, and uniformly random 

1 g I, PT[hj{x i) =yi,..., hi{xk) = yk] < c/\R\'^. 

^ On a Linux PC with an Intel Pentium 120 MHz processor. 

2 On a 300 MHz SUN ULTRA SPARC. 
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A standard construction of a (2, fc)-universal family for U = {0, . . . — 1} and 

range R = {0, . . . ,r — 1}, where p is prime, contains, for every choice of 0 < 
oo, oi, . . . , Ofe-i < p, the function h{x) = aix’") mod p) mod r. 

We assume that keys from [/ fit in a single machine word, i.e., U = {0, 1}’". 
This is not a serious restriction, as long keys can be mapped to short keys by 
choosing a random function from a (0(1), 2)-universal family for each word of 
the key, mapping the key to the bitwise exclusive or of the individual function 
values 0. A function chosen in this way can be used to map S injectively to 
{0, l}2i°g"+0(i)^ effectively reducing the universe size to O(n^). In fact, with 
constant probability the function is injective on a given sequence of n consecutive 
sets in a dictionary (see d)- A result of Siegel m says that for any constant 
e > 0, if the universe is of size there is an (0(1), 0(logn))-universal family 

that can be evaluated in constant time, using space and initialization time 0(n*^). 
However, the constant factor of the evaluation time is rather high. 

We reserve a special value _L S t/ to signal an empty cell in hash tables. For 
Double Hashing an additional special value is used to indicate a deleted key. 

3 Cuckoo Hashing 

Cuckoo hashing is a dynamization of a static dictionary described in The 
dictionary uses two hash tables, Ti and T 2 , of length r and two hash functions 
hi, h 2 ■ U — I {0, . . . , r — 1}. Every key x £ S is stored in cell hi{x) of Ti or h 2 {x) 
of T 2 , but never in both. Our lookup function is 

function lookup(a;) 

return Ti[hi{x)] = a; V T 2 [h 2 {x)] = x. 

end; 

We remark that the idea of storing keys in one out of two places given by 
hash functions previously appeared in m in the context of PRAM simulation, 
and in Q for a variant of chained hashing. It is shown in CHI that if r > (1 + e) n 
for some constant e > 0 (i.e., the tables are to be a bit less than half full), and 
hi, /i 2 are picked uniformly at random from an (0(1), 0(log n))-universal family, 
the probability that there is no way of arranging the keys of S according to hi 
and /i 2 is 0(l/n). A slightly weaker conclusion, not sufficient for our purposes, 
was derived in P2i. A suitable arrangement was shown in m to be computable 
in linear time by a reduction to 2-SAT. 

We now consider a simple dynamization of the above. Deletion is of course 
simple to perform in constant time, not counting the possible cost of shrinking 
the tables if they are becoming too sparse. As for insertion, it turns out that the 
“cuckoo approach”, kicking other keys away until every key has its own “nest”, 
works very well. Specifically, if x is to be inserted we first see if cell hi{x) of 
Ti is occupied. If not, we are done. Otherwise we set Ti[hi{x)] ■£- x anyway, 
thus making the previous occupant “nestless”. This key is then inserted in T 2 
in the same way, and so forth. As it may happen that this process loops, the 
number of iterations is bounded by a value “MaxLoop” to be specified below. 
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If this number of iterations is reached, everything is rehashed with new hash 
functions, and we try once again to accommodate the nestless key. Using the 
notation a; -O- y to express that the values of variables x and y are swapped, the 
following code summarizes the insertion procedure. 

procedure insert(x) 

if lookup(a:) then return; 
loop MaxLoop times 

\iTi[hi{x)] = _L then { Ti[hi{x)] ^ x\ return; } 

X O Ti[hi{x)]\ 

if T2[h2{x)\ = _L then { T2[h2{x)\ ^ x; return; } 

X -H> T2[h2(x)J; 

end loop 

rehash(); insert (a;); 

end; 

The above procedure assumes that the tables remain larger than (1 + e) n cells. 
When no such bound is known, a test must be done to find out when a rehash 
to larger tables is needed. Note that the insertion procedure is biased towards 
inserting keys in Ti. As seen in Section El this leads to faster successful lookups. 



3.1 Analysis 

We first show that if the insertion procedure loops for MaxLoop = oo, it is 
not possible to accommodate all the keys of the new set using the present hash 
functions. Consider the sequence ai,02, . . . of nestless keys in the infinite loop. 
For i,j > 1 we define Aij = {a^, . . . , o^}. Let j be the smallest index such that 
aj G Aij-i. At the time when aj becomes nestless for the second time, the 
change in the tables relative to the configuration before the insertion is that ak 
is now in the previous location of a^+i, for 1 < k < j. Let i < j he the index such 
that Oi = Uj. We now consider what happens when aj is nestless for the second 
time. If * > 1 then aj reclaims its previous location, occupied by Oi_i. If j > 2 
then ai-i subsequently reclaims its previous position, which is occupied by at-2, 
and so forth. Thus we have a^+z = ai-z for z = 0 , 1 , . . . , z — 1 , and end up with 
fli occurring again as ai+j-i. Define Sk = |/ii[Ai_fc]| + |/i2[Ai_fc]|, i.e., the number 
of table cells available to Ai^k- Obviously Sk < Sk-i + 1 , as every key at, i > 1 , 
has either hi{ai) = hi{ai-\) or /i2(oi) = ft.2(ai-i). In fact, Sj-\ = Sj-2 < j — 1 , 
because the key aj found in Ti[hi{aj-i)] or T2[/i2(aj_i)] occurred earlier in the 
sequence. As all of the keys aj, . . . , aj+i-i appeared earlier in the sequence, we 
have Sjj-i-2 = Sj-2- Let j' be the minimum index such that j' > j and aj> G 
Similar to before we have In conclusion, = j' — i 

and Sj'-i = Sj'-2 < Sj+i-2 + {f - 2 ) - {j + i - 2 ) = Sj-2 + f - j - i < f - i- 
Thus, there are not sufficiently many cells to accommodate Ai ji_i for the current 
choice of hash functions. 

In conjunction with the result from CHI, the above shows that the inser- 
tion procedure loops without limit with probability 0 (l/n). We now turn to 
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the analysis for the case where there is no such loop, showing that the insertion 
procedure terminates in 0(1) iterations, in the expected sense. Consider a pre- 
fix ai, 02 , a; of the sequence of nestless keys. The crucial fact is that there 
must be a subsequence of at least Z/3 keys without repetitions, starting with an 
occurrence of the key a\, i.e., the inserted key. As earlier, we pick i and j, i < j, 
such that Qi = Qj and j is minimal, and once again we have a^+z = Oi_z for 
2: = 0, 1, . . . , J — 1. There can be no index j' > j + i — 1 such that Uj/ G Aiji-i, 
in that our earlier argument showed that the set cannot be accommodated when 
such indices i, j and j' can be chosen. This means that both of the sequences 
oi, . . . , aj-i and . . . ,ai have no repetitions. As oi = and i < j, 

one of the sequences must be the desired one of length at least 1/3. 

Suppose that the insertion loop runs for at least t iterations. By the above 
there is a sequence of distinct keys 61 , ... , bm, m> (2t — l)/3, such that bi is 
the key to be inserted, and such that for some f3 G {0,1} 

h2-/3{bi) = h2-p{b2), hi+!3{b2) = /li+/3(53), /l2-/3(&3) = ^2-/3(^4), • ■ • ( 1 ) 

Given b\ there are at most sequences of m distinct keys. For any such 

sequence and any (3 G {0,1}, if the hash functions were chosen from a {c,m)- 
universal family, the probability that 0 holds is bounded by Thus, 

the probability that there is any sequence of length m satisfying O is bounded 
by 2c Suppose we use a (c, 61og3_|_g n)-universal 

family, for some constant c (e.g., Siegel’s family with constant time evaluation 
pm). Then the probability of more than 31og3_|_gn iterations is 0(l/n^). Thus, 
we can set MaxLoop = Olog^^.^ n with a negligible increase in the probability of 
a rehash. When there is no rehash the expected number of iterations is at most 

OQ 

1 + ^ 2 c(l -k = 0 ( 1 -k 1 /e) . 

t=2 

A rehash has no failed insertions with probability 1 — 0(l/n). In this case, 
the expected time per insertion is constant, so the expected time is 0(n). As the 
probability of having to start over with new hash functions is bounded away from 
1, the total expected time for a rehash is 0(n). This implies that the expected 
time for insertion is constant if r > (1 -k e)(n-k 1). Resizing of tables can be done 
in amortized expected constant time per update by the usual doubling/halving 
technique. 



4 Experiments 

To examine the practicality of CuCKOO Hashing we experimentally compare 
it to three well known hashing methods. Chained Hashing (with separate 
chaining). Linear Probing and Double Hashing, as described in Sect. 
6.4]. We also consider Two-Way Chaining Q, implemented in a cache-friendly 
way, as recently suggested in 0 . 
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4.1 Data Structure Design and Implementation 

We consider positive 32 bit signed integer keys and use 0 as _L. The data struc- 
tures are robust in that they correctly handle attempts to insert an element 
already in the set, and attempts to delete an element not in the set. A slightly 
faster implementation can be obtained if this is known not to occur. 

Our focus is on achieving high performance dictionary operations with a 
reasonable space usage. By the load factor of a dictionary we will understand 
the size of the set relative to the memory used0. As seen in Fig. 44] there 
is not much to be gained in terms of average number of probes for the classic 
schemes by going for load factor below, say, 1/2 or 1/3. As CuCKOO Hashing 
only works when the size of each table is larger than the size of the set, we can 
only perform a comparison for load factors less than 1/2. To allow for doubling 
and halving of the table size, we allow the load factor to vary between 1/5 
and 1/2, focusing especially on the “typical” load factor of 1/3. For CuCKOO 
Hashing and Two-Way Chaining there is a chance that an insertion may fail, 
causing a “forced rehash” . If the load factor is larger than a certain threshold, 
somewhat arbitrarily set to 5/12, we use the opportunity to double the table 
size. By our experiments this only slightly decreases the average load factor. 

Apart from Chained Hashing, the schemes considered have in common the 
fact that they have only been analyzed under randomness assumptions that are 
currently, or inherently, unpractical to implement (0(log n)-wise independence 
or n-wise independence). However, experience shows that rather simple and ef- 
ficient hash function families yield performance close to that predicted under 
stronger randomness assumptions. We use a function family from 0 with range 
{0, 1}"? for positive integer q. For every odd a, 0 < a < 2*", the family contains 
the function ha{x) = {ax mod 2*") div 2’"“'?. Note that evaluation can be done 
by a 32 bit multiplication and a shift. This choice of hash function restricts 
us to consider hash tables whose sizes are powers of two. A random function 
from the family (chosen using C’s rand function) appears to work fine with all 
schemes except CucKOO Hashing. For Cuckoo Hashing we found that us- 
ing a (1, 3)-universal family resulted in fewer forced rehashes than when using a 
(1, 2)-universal family. However, it turned out that the exclusive or of three inde- 
pendently chosen functions from the family of 0 was faster and worked equally 
well. We have no good explanation for this phenomenon. For all schemes, various 
other families were tried, with a decrease in performance. 

All methods have been implemented in C. We have striven to obtain the 
fastest possible implementation of each scheme. Details differing from the refer- 
ences and specific choices made are: 

Chained Hashing. We store the first element of each linked list directly in the 
hash table. This often saves one cache miss, and slightly decreases memory 
usage, in the expected sense, as every non-empty chained list is one element 
shorter. C’s malloc and free functions were found to be a performance 

^ For Chained Hashing, the notion of load factor traditionally disregards the space 
used for chained lists, but we desire equal load factors to imply equal memory usage. 
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bottleneck, so a simple “free list” memory allocation scheme is used. Half of 
the allocated memory is used for the hash table, and half for list elements. 
If the data structure runs out of free list elements, its size is doubled. 
Double Hashing. Deletions are handled by putting a “deleted” marker in the 
cell of the deleted key. Queries skip over deleted cells, while insertions over- 
write them. To prevent the tables from clogging up with deleted cells, re- 
sulting in poor performance for unsuccessful lookups, all keys are rehashed 
when 2/3 of the hash table is occupied by keys and “deleted” markers. 
Two-Way Chaining. We allow four keys in each bucket. This is enough to 
keep the probability of a forced rehash low for hundreds of thousands of 
keys, by the results in P]. For larger collections of keys one should allow 
more keys in each bucket, resulting in general performance degradation. 
Cuckoo Hashing. The architecture on which we experimented could not par- 
allelize the two memory accesses in lookups. Therefore we only evaluate the 
second hash function after the first memory lookup has shown unsuccessful. 

Some experiments were done with variants of CuCKOO Hashing. In partic- 
ular, we considered Asymmetric Cuckoo, in which the first table is twice the 
size of the second one. This results in more keys residing in the first table, thus 
giving a slightly better average performance for successful lookups. For example, 
after a long sequence of alternate insertions and deletions at load factor 1/3, we 
found that about 76% of the elements resided in the first table of Asymmet- 
ric Cuckoo, as opposed to 63% for Cuckoo Hashing. There is no significant 
slowdown for other operations. We will describe the results for Asymmetric 
Cuckoo when they differ significantly from those of CuCKOO Hashing. 



4.2 Setup and Results 

Our experiments were performed on a PC running Linux (kernel version 2.2) 
with an 800 MHz Intel Pentium HI processor, and 256 MB of memory (PCIOO 
RAM). The processor has a 16 KB level 1 data cache and a 256 KB level 2 
“advanced transfer” cache. Our results can be explained in terms of processor, 
cache and memory speed in our machine, and are thus believed to have signif- 
icance for other configurations. An advantage of using the Pentium processor 
for timing experiments is its rdtsc instruction which can be used to measure 
time in clock cycles. This gives access to very precise data on the behavior 
of functions. Programs were compiled using the gcc compiler version 2.95.2, 
using optimization flags -09 -DCPU=586 -march=i586 -f omit-frame-pointer 
-f inline-functions -f force-mem -funroll-loops -fno-rtti. As mentioned 
earlier, we use a global clock cycle counter to time operations. If the number 
of clock cycles spent exceeds 5000, and there was no rehash, we conclude that 
the call was interrupted, and disregard the result (it was empirically observed 
that no operation ever took between 2000 and 5000 clock cycles). If a rehash is 
made, we have no way of filtering away time spent in interrupts. However, all 
tests were made on a machine with no irrelevant user processes, so disturbances 
should be minimal. 
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Our first test was designed to model the situation in which the size of the 
dictionary is not changing too much. It considers a sequence of mixed opera- 
tions generated at random. We constructed the test operation sequences from 
a collection of high quality random bits publicly available on the Internet m 
The sequences start by insertion of n distinct random keys, followed by 3n times 
four operations: A random unsuccessful lookup, a random successful lookup, a 
random deletion, and a random insertion. We timed the operations in the “equi- 
librium”, where the number of elements is stable. For load factor 1/3 our results 
appear in Fig. Q which shows an average over 10 runs. As Linear Probing 
was consistently faster than Double Hashing, we chose it as the sole open ad- 
dressing scheme in the plots. Time for forced rehashes was added to the insertion 
time. Results had a large variance for sets of size 2^^ to 2^® - outside this range 
the extreme values deviated from the average by less than about 7%. 

As can be seen, the time for lookups is almost identical for all schemes as 
long as the entire data structure resides in level 2 cache. After this the aver- 
age number of random memory accesses (with the probability of a cache miss 
approaching 1) shows up. Filling a cache line seems to take around 160 clock cy- 
cles, with the memory location looked up arriving at the processor after about 80 
clock cycles on average. This makes linear probing an average case winner, with 
Cuckoo Hashing and Two-Way Chaining following about half a cache miss 
behind. For insertion the number of random memory accesses again dominates 
the picture for large sets, while the higher number of in-cache accesses and more 
computation makes CuCKOO Hashing, and in particular Two-Way chaining, 
relatively slow for small sets. The cost of forced rehashes sets in for Two-Way 
Chaining for sets of more than a million elements, at which point better re- 
sults may have been obtained by a larger bucket size. For deletion Chained 
Hashing lags behind for large sets due to random memory accesses when free- 
ing list elements, while the simplicity of CuCKOO Hashing makes it the fastest 
scheme. We believe that the slight rise in time for the largest sets in the test 
is due to saturation of the bus, as the machine runs out of memory and begins 
swapping. It is interesting to note that all schemes would run much faster if the 
random memory accesses could bypass the cache (using perhaps 20 clock cycles 
per random memory access on our machine). 

The second test concerns the cost of insertions in growing dictionaries and 
deletions in shrinking dictionaries. Together with Fig.Qthis should give a fairly 
complete picture of the performance of the data structures under general se- 
quences of operations. The first operation sequence inserts n distinct random 
keys, while the second one deletes them. The plot is shown in Fig. 0 For small 
sets the time per operation seems unstable, and dominated by memory alloca- 
tion overhead (if minimum table size 2^^ is used, the curves become monotone). 
For sets of more than 2^^ elements the largest deviation from the averages over 
10 runs was about 6%. Disregarding the constant minimum amount of memory 
used by any dictionary, the average load factor during insertions was within 2% 
of 1/3 for all schemes except Chained Hashing whose average load factor was 
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Fig. 1. The average time per operation in equilibrium for load factor 1/3. 
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Fig. 2. The average time per insertion/deletion in a growing/shrinking dictionary for 
average load factor « 1/3. 
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about 0.31. During deletions all schemes had average load factor 0.28. Again the 
winner is Linear Probing. We believe this is largely due to very fast rehashes. 

Access to data in a dictionary is rarely random in practice. In particular, 
the cache is more helpful than in the above random tests, for example due to 
repeated lookups of the same key, and quick deletions. As a rule of thumb, the 
time for such operations will be similar to the time when all of the data structure 
is in cache. To perform actual tests of the dictionaries on more realistic data, 
we chose a representative subset of the dictionary tests of the 5th DIMACS 
implementation challenge m- The tests involving string keys were preprocessed 
by hashing strings to 32 bit integers, preserving the access pattern to keys. Each 
test was run six times - minimum and maximum average time per operation can 
be found in Table P which also lists the average load factor. Linear probing is 
again the fastest, but mostly only 20-30% faster than the CuCKOO schemes. 



Table 1. Average clock cycles per operation and load factors for the DIMACS tests. 
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We have seen that the number of 
random memory accesses (i.e., cache 
misses) is critical to the performance 
of hashing schemes. Whereas there is 
a very precise understanding of the 
probe behavior of the classic schemes 
(under suitable randomness assump- 
tions), the analysis of the expected 
time for insertions in Sect. 13.11 is 
rather crude, establishing just a con- 
stant upper bound. Figure 0 shows 
experimentally determined values for 
the average number of probes during 
insertion for various schemes and load 
factors below 1/2. We disregard reads and writes to locations known to be in 
cache, and the cost of rehashes. Measurements were made in “equilibrium” af- 
ter 10® insertions and deletions, using tables of size 2^® and truly random hash 
function values. It is believed that this curve is independent of the table size (up 
to vanishing terms). The curve for Linear Probing does not appear, as the 
number of non-cached memory accesses depends on cache architecture (length 
of the cache line), but it is typically very close to 1. It should be remarked that 
the highest load factor for Two-Way Chaining is 0(1/ log log n). 
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5 Conclusion 

We have presented a new dictionary with worst case constant lookup time. It is 
very simple to implement, and has average case performance comparable to the 
best previous dictionaries. Earlier schemes with worst case constant lookup time 
were more complicated to implement and had considerably worse average case 
performance. Several challenges remain. First of all an explicit practical hash 
function family that is provably good for the scheme has yet to be found. Sec- 
ondly, we lack a precise understanding of why the scheme exhibits low constant 
factors. In particular, the curve of Fig. 0 and the fact that forced rehashes are 
rare for load factors quite close to 1/2 need to be explained. 
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Abstract. Variable fixing is an important technique when solving combinatorial 
optimization problems. Unique profitable variable values are detected with respect 
to the objective function and to the constraint structure of the problem. Relying on 
that specific structure, effective variable fixing algorithms (VFAs) are only suited 
for the problems they have been designed for. Frequently, new combinatorial 
optimization problems evolve as a combination of simpler structured problems. 
For such combinations, we show how VFAs for linear optimization problems can 
be coupled via Lagrangian relaxation. The method is applied on a multimedia 
problem incorporating a knapsack and a maximum weighted stable set problem. 



1 Introduction 

Reduction algorithms are of great importance when combinatorial optimization problems 
have to be solved exactly. The tightening of problem formulations within a branch- 
and-bound approach improves on the quality of the bounds computed as well as on the 
approach’s robustness. Given a maximization problem P(at) where a; G {0, 1}", n G IN, 
the idea of variable fixing is to use upper bound information to detect unique profitable 
assignments for a variable: If an upper bound on P{x\xi=k)^ k G {0, 1}, drops below 
the best known solution value, then we can set ^ 1 — fc. 

Frequently, constraints of optimization problems can be grouped such that the overall 
problem can be viewed as a combination of two or more simpler structured problems. 
Assuming that efficient variable fixing algorifhms (VFAs) for these subproblems exist, 
their independent application usually does not yield an effective algorithm to perform 
variable fixing for the combined problem. The reason for this is that tight bounds on the 
objective cannot be obtained by taking only a subset of the constraints into account. 

Fora multimedia application incorporating a knapsack problem (KP) and a maximum 
weighted stable set problem (MWSSP) on an interval graph, we show exemplary how 
two VFAs for linear optimization problems can be coupled via Lagrangian relaxation to 
achieve an effective reduction algorithm for the combined problem. 
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The paper is structured as follows: In Section|3 we introduce the Automatic Record- 
ing Problem, that can be viewed as a combination of a knapsack problem and a MWSSP 
on interval graphs. In Section |3 we introduce an efficient VFA for the latter problem. 
Then, in Section 01 we show how it can be coupled with a previously developed VFA 
for KP via Lagrangian relaxation. Finally, in Section0]we give numerical results. 



2 The Automatic Recording Problem 

The, Automatic Recording Problem (ARP) is an example of a problem that is constituted 
by two simpler substructures. We focus on algorithms that solve the problem exactly 
and give a tightened formulation of the ARP as an integer program (IP). 

The technology of digital television offers new possibilities for individualized ser- 
vices that cannot be provided by nowadays analog broadcasts. Additional information 
like classification of content, or starting and ending times can be submitted within the 
digital broadcast stream. With those informations at hand, new services can be provided 
that make use of individual profiles and maximize customer satisfaction. 

One service - which is available already today - is an "intelligent" digital video 
recorder that is aware of its users’ preferences and records automatically (see El). The 
recorder tries to match a given user profile wifh the information submitted by the different 
TV channels. E.g., a user may be interested in thrillers, the more recent the better. The 
digital video recorder is supposed to record movies such that the users’ satisfaction 
is maximized. As the number of channels may be enormous (more than 100 digital 
channels are possible), a service that automatically provides an individual selection is 
highly appreciated and subject of current research activities (for example within projects 
like UP-TV funded by the European Union or the TV-Anytime Forum). 

In this context, two restrictions have to be met. Eirst, the storage capacity is limited 
(lOh of MPEG-2 video need about 18 GB). And second, only one video can be recorded 
at a time. More formally, we define the problem as follows: 

Definition 1. Letn S IN, U = {0, . . . ^n—1} the set of movies, start(i) < end(i) Vi G 
V the corresponding starting and ending times, w = (wi)o<i^n G the storage 
requirements, K G M-)_ the storage capacity, and p = (pi)o<i<n G IN” the profit 
vector. 

We say that the interval li := [sfarf(i), end(i)] corresponds to movie i G V, and call 
two movies i,j G V owerlappingwhose corresponding intervals overlap, i.e. liCilj 0. 
For X C-V we call px ■= P* the user satisfaction (with respect to X). 

The Automatic Recording Problem (ARP) then is to find a subset X C V such that 

(a) X can be stored within the given disc size, i.e. 

(b) at most one movie must be recorded at a time, i.e. IiC\ Ij = i j G A 

(c) X maximizes the user satisfaction, i.e. px > Py V V C V, V respecting (a),(b). 

Obviously, even if all movies are pairwise non-overlapping (i.e., if restriction (b) is 
obsolete), it remains to solve a knapsack problem. Thus, the ARP is NP-hard. However, 
there is an FPTAS for the ARP. Eor details, we refer to fTHl . 
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2.1 A Mathematical Programming Formulation 

Because the problem of finding and proving optimal solutions is of interest in its own 
right, and also because the FPTAS we developed is far too memory consuming to be of 
practical relevance, we focus on exact approaches to solve the ARP. Using mathematical 
programming, the problem can be stated as an integer program (IP): 

Maximize Y.o<i<nPi^i (^^1) 

subject to Xi + Xj < 1 'iO<i<j<n, 

X G {0,1}” 

The objective function maximizes the user satisfaction. Constraints of the form Xi + 
Xj < 1 ensure that for overlapping intervals li, Ij at most one movie can be selected. 
Storage restrictions are enforced by the last row. The formulation can be tightened by 
replacing the overlapping constraints with maximal clique constraints. 

Dehnition 2. A set C Q V is called a conflict clique, iff li C\ I j ^ i,j G C. A 
conflict clique C is called maximal, iff'i D QV, D conflict clique: C C D ^ C = D. 
Let M := {Co, . . . , Cm-i} C 2^ the set of maximal conflict cliques. 

On interval graphs, the computation of maximal cliques can be performed in time 
0(nlogn). Then, restrictions of the form a;i<lV0<p<m imply that 

Xi + Xj <\ for all nodes i,j G V whose corresponding intervals overlap. On the other 
hand, if Xi + Xj < 1 for all overlapping intervals, it is also true that 2;i<lV0< 

p < m. Thus, IP (1) is equivalent to 

Maximize Y.o<i<nPi^i 

subject to ^ieCp 'i Q < p < m 

Y.0<i<n^iXt < K 

X G {0,1}” 

To solve a (mixed) integer program, branch-and-bound approaches have proofed to be 
efficient, widely applicable and thus are most commonly used. In every search node, 
a bound based on some (often continuous) relaxation is being computed. If that bound 
is worse than the objective value B of the incumbent solution, backtracking occurs. A 
successful application of the branch-and-bound paradigm relies heavily on tight bounds 
that can be computed quickly. Fixing variables can help to improve on the performance of 
a branch-and-bound search if the VFA is both, effective and efficient. Effective means, 
that the algorithm must have an impact, i.e., it has to be able to fix many variables, 
whereas the efficiency measures how quickly the routine works. 

The effectivity of a VFA mainly depends on the quality of bounds it uses to estimate 
the impact of fixing a variable to one of its values . For the ARP, our experiments show that 
the continuous relaxation bound yields a good estimate on the solution quality that can 
be reached. Thus, it can be used for pruning purposes in a branch-and-bound approach. 
But it is not straight forward to see, how this bound could be used for fixing variables 
effectively, that is, other than by probing via full reoptimization, which is inefficient. On 
the other hand, the fixing of variables with respect to reduced cost information can be 
done quickly, but is not very effective. 
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The ARP can be viewed as a combination of two simpler optimization problems: a 
knapsack problem, and a MWSSP on an interval graph. For the knapsack problem, an 
efficient VFAs exists 0|. It runs in time 0(nlogn), and in amortized linear time for 
f2(log n) search nodes. In the following, we develop a VFA for the MWSSP substructure 
of the ARP. 

3 Maximum Weighted Stable Set on Interval Graphs 

On interval graphs, the problem of finding a maximum weighted stable set [6J can be 
solved easily in time 0(nlogn) O. However, the existing algorithms based on sweep 
line or dynamic programming approaches neither provide dual values for the maximal 
clique constraints (which is important for the coupling of VFAs as we shall see later), 
nor do they suggest how variable fixing could be performed efficiently. 

3.1 A Mathematical Programming Approach 

We present an algorithm based on mathematical programming for the MWSSP on interval 
graphs that provides us with dual information as a by product, and that can be extended 
to an efficient VFA for the problem. Due to space restrictions, we omit the proofs here. 
For details, please refer to (Q. 

Remark 1. Uo<p<m, Q) = V, because {i} is a conflict clique \/ i G V. Thus, there 
exists a maximal conflict clique Cp, 0 < p < m, such that i G Cp. 

Definitions. We set max start : M — >• IN, maxstart{C) := inaxi^c{start{i)}. 

Lemma 1. The function max start is injective. 

Without loss of generality, we may thus assume that the conflict cliques are ordered with 
respect to max start, i.e. maxstart(Cp) < maxstart(Cq) W 0 < p < q < m. 

Lemma 2. Let 0 < p < r < m and i G CpC\ Cr- Then, i G Cq\/ p < q < r. 

Corollary 1. m < n. 

Definition 4. We set Rp := Cp \ Cp+i V0<p<m— 1 and Rm-i ■= Cm-i. and 
call every such Rp a (max_start) rest clique. 

Remark 2. The rest cliques form a partition of V. 

Let Co, ■■■ , Cm-i denote the maximal conflict cliques ordered according to max start, 
and consider IP (3) that evolves from IP (2) by dropping the capacity restriction. 

The maximal conflict clique restrictions imply that Xi + xj < 1 for all nodes i,jG 
V whose corresponding intervals overlap. On the other hand, if Xi + Xj < 1 for all 
overlapping intervals R and Ij, it is also true that Xi<l\/0<p<m. Thus, 

the IP (3) solves the MWSSP on interval graphs. 

In the following, with A G {0, we denote the corresponding matrix to IP (3), 

i.e. A = (upi)o<p<m,o< 2 <n with api — 1 iff i G Cp. 

Theorem 1. The corresponding matrix A of IP (3) is an interval matrix. 

Corollary 2. IP (3) is totally unimodular. 
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Let Ci := —pi for all 0 < i < n. With Corollary 0 it is now possible to solve the 
MWSSP on interval graphs as a Linear Program. With RemarkQl the maximal conflict 
clique restrictions imply that a: < 1. 

A Pivot Selection Strategy. We use the Simplex method for solving LP (4). Let 
Rq, . . . , Rm-i denote the {max. start) rest cliques. In iteration 0 < f < m, we choose 
q := t as pivot row and j G Rq with the highest reduced costs as pivot column. If the 
reduced costs of j are lower than 0, we perform a pivot step. Otherwise we proceed with 
the next iteration immediately. 

Theorem 2. After m such iterations, the Simplex tableau is primal and dual feasible. 

Before we can proof the above Theorem^, we need to state two more Lemmata first. 
In the following, we refer to the matrix A* € {—1, 0, after 0 < t < m Simplex 
iterations with (apj)o<p<m,o<i<n. to the right hand side 6* G IR™ with (bp)o<p<m, and 
to the reduced costs c* with (c*)o<i<n- 

We first show that our pivot selection preserves primal feasibility. We observe that 
X = 0 is primal feasible as&p=l>0, V0<p<m. To assure the maintenance of 
primal feasibility, we must show that > 0, V 0 < t < m. To do so, we can proof the 
following 

Lemma 3. Let 0<p<m, 0<t<m, 0<i<n. Then, 

(a) p >t — 1 implies that and = 6° = 1 

(b) b*p = 0,i€ Rp implies a^^ G {-1, 0} 

(c) h*p = l,iG Rp implies a^^ G {0, 1} 

(d) b^pG {0,1} 

The following Lemma implies that after at most m iterations we achieve dual feasibility. 
Lemma 4. Let 1 <t < m. Then, 

(a) c\ > 0 for alii G Rt-i- 

(b) iG Uo<p<t<m Rp implies c‘+^ = c\. 

Proof, of Theorem 0 In Lemma 0 and Lemma 0 we have shown that after m < n 
iterations the Simplex tableau is primal and dual feasible. □ 

We have shown how the MWSSP on interval graphs can be stated as a totally uni- 
modular LP. Further, we have proven a feasible pivot selection strategy that yields an 
optimal tableau after at most n Simplex iterations. 

An Efficient Simplex Realization. In the following, we develop an efficient 0{n log n)- 
time algorithm to calculate a set Q C {0, . . . , n — 1} with fr) Ij = $\/ i, j G Q, such 
that CQ = minimal. Most importantly, the algorithm also provides us with 

dual information as a byproduct. To establish that algorithm, we show how the Simplex 
calculations according to the pivot strategy developed can be performed efficiently. 
Again, due to limited space we have to omit all proofs. They can be found in II 211 . 

Theorem 3. Let 0 < I < m, and let {Jq, . . . , ji) G {0, . . . , n — 1}* denote the sequence 
of pivot columns according to Section IJ. 71 for which a pivot step has been performed. 
Further, for all 0 < k < I let 0 < pk < m with jk G Rp^. Set Q := {jT | 0 < /c < 
I and Ij^ n Jj,, = 0 V A: < r < /}. Then cq is minimal. 
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The above Theorem 0 allows us to construct an optimal solution if we know the 
sequence of pivot elements. But according to Section ^31 calculating the sequence of 
pivot elements is an easy task, if only we can determine the reduced costs quickly: 

Lemma 5. Let 0 < t < m, {jo, ■■ ■ C be the sequence of pivot columns 

according to our pivot selection strategy, and let d : {0, . . . , m — 1} — > {0, 1} with 
d{f) := 1 < 0. Further, we set qt := t, 0 < qt < m, and let jt C Rq^. 

Let z* C IR denote the objective function value and cfq^j^ be the pivot element in 
iteration t. Further, let fi := min{p | 0 < p < m, i C Cp} V 0 < i < n denote 
the index of the first maximal conflict clique that node i belongs to, i.e. ^ = 1, 
and Opj = 0 V 0 < p < /i- Finally, let g\ := z^' — z* if fi < t, and otherwise 
gl := 0 y 0 < i < n. Then, 

(a) z‘+i = z* + • d{t) < z*. 

(b) ^‘ = Eo<r<tCj,-c«W- 

(c) c\= a + g\ Vie Ut<p<m Rp 

An Algorithm providing Dual Information. With Theorem 0 and Lemma 0 we can 
formulate an efficient algorithm solving the MWSSP on interval graphs that provides us 
with dual values as a byproduct. In phase 1, we determine the (max start) rest cliques 
Rp, with 0 < p < m, and the according values V 0 < i < n. This can be done in 
time 0(nlog n). 

Phase 2 consists of m iterations: First, we set z° := 0. In each iteration 0 < t < m 
we calculate c- = Ci + z-^* — z* V f € Rt, and jt G Rt with = minig/jjlc-}. If 
Cjj > 0, we set := z‘, otherwise z*“*'^ := z* + Finally, we set t := t + 1 and 
proceed with the next iteration. 

With Remark 13 we know that the sets Rp form a partition of V. Thus, all nodes 
0 < i < n are being looked at exactly once to calculate the reduced costs. Also, in 
all computations of the pivot columns, each node is incorporated only once. Therefore, 
phase 2 takes time 0{n). 

After m iterations, we know the shortest path value z™, as well as the sequences 
{jo, ■ ■ ■ ,ji) G y’’ and (po, ■ ■ ■ ,pi) G {0, . . . , m — 1}^ 0 < / < m, of pivot columns 
and rows for which a pivot step has been performed. By applying Theorem 0 we can 
construct a shortest path out of this information in linear time. Moreover, as the rest 
cliques of the underlying interval graph are independent of the objective function, we 
achieve an incremental linear time algorithm for i7(log n) calls with different objectives. 

Most importantly, we get dual values as a byproduct. By looking at the optimal 
tableau, we find that the optimal dual variable for each maximal clique constraint 0 < 
t < m for which a pivot step has been carried out has value — c™. All other dual values 
are 0. 

3.2 Variable Fixing 

Now, we want to use the previously developed algorithm to perform variable fixing. To 
develop a VFA, we reinterpret the problem as finding a shortest path in a node-weighted 
co-interval graph. We introduce an artificial source cr and sink r before and after all 
other nodes, and define Path^y^T as the set of paths from a to t. The value cp of a path 
P G Patha,T is defined as cp := I]iGP\{<T,r} 
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Definition 5. Given an upper bound Z G IR, we define Rem{Z) and Req{Z) by 
Rem{Z) := {0 < i < n \ y P G Pathcr^T,i G P : cp > Z} Req{Z) := {0 < i < 
n \y P G Pathcr^T^ i ^ P : cp > Z}. The variable fixing problem based on shortest 
path information (VFSPP) then is to determine Rem{Z) and Req{Z). 

Removing Nodes. To compute Rem{Z), we need to find out the value of the shortest 
path from the source a via node i to the sink r for alH G {0, . . . , n — 1}. This can he done 
hy calculating the shortest path distances from the source and to the sink. The algorithm in 
Sectio ni:!. Il can easily be adapted to determine the shortest path distances from the source 
to each node j G {0, . . . ,n — Ij.ForO <i< n, let/i min{p | 0 < p < m, i G Cp} 
denote the index of the first maximal conflict clique that node i belongs to. Then, we 
have to solve the following Linear Program 

Minimize Y,o<i<j CXi {LP 5) 

subject to Siec Xi < 1 ^ Q P < fj 

X >0 

According to the previously developed theory, the minimal objective for the above LP (5) 
is exactly , Thus, the shortest path distance of node j is exactly dj = q- . 

A similar theory shows that the shortest path distances to the sink can be de- 
termined by applying the algorithm of Section using the last clique belongings 
li := maxjp | 0 < p < m, i G Cp} V 0 < i < n, and the minuend rest cliques, where 
minuend : M — >■ IN, min_end{C) := minigc{end(f)}. Solving LP (4) in this inverse 
manner yields objective function values for all iterations 0 < t < m. 

Then, the shortest path distance to the sink is '^dj = -|- Cj. With those 

values at hand, we can determine the shortest path value Cj through node 0 < j < n by 
Cj = dj + '^dj — Cj. Then, Rem{Z) = {0<i<n\ei> Z}. 

We conclude that Rem{Z) can be computed in time (9(nlogn) and in amortized 
linear time for l7(log n) calls of the VFA. 

Now we know how to determine nodes that should be removed from the graph. Of 
course, other constraints may as well remove nodes. In both cases, we must be able to 
take these changes into consideration efficiently for the next call to our routine. Without 
going into implementation details, we note that the data structures storing the rest cliques 
as well as the first and last clique belongings can be compressed in linear time to delete 
any number of nodes from the graph. 

Requiring Nodes. To compute Req{Z), we need to identify all nodes that must be an 
element of any path having a value lower than Z. Obviously, only nodes on the shortest 
path S potentially have this property. For every node j G S, we thus need to find out 
what the value of a shortest path P with j ^ P is. 

Remark 3. Let 0 < fj < Ij < m denote the first and the last clique belongings of j. 
Further, let P be the shortest path with j ^ P. Obviously, it either holds that Ij D R = 
0 V f S P, or /q n /i 0 3 f S P. In the first case, we know that the value of the 
shortest path not using the time interval Ij has the value cp = z^t q- _ 2c j . In the 

second case, we do not know how to determine the value of P efficiently. However, after 
we have determined and deleted Rem{Z) from the graph, we only have to check if there 
exists any node i G {0, . . . ,n — 1}, i 7 ^ J, with Ij fl R 0. Because if such a node i 
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exists, there also exists a path P with i G P and cp < Z (otherwise i would have been 
deleted before). As i and j overlap, we further know that j ^ P. Thus, in P we have 
found a path not covering j with a value lower than Z. Therefore, j ^ Req{Z). On the 
other hand, if no such node i exists, the second case is obsolete, and we only need to 
consider the hrst case. 

By making that observation, we can determine Req{Z) in linear time: First, for all 
j G P we check whether there exists a node in the shrunk graph that overlaps with j. 
Without specifying the implementation details here, we just note that this can be done in 
constant time for each node j. If no overlapping node exists, we compute — 2cj 

and check whether this value is lower than Z. If not, we add j to Req{Z), otherwise we 
do not. 

Now, we have an efficient algorithm at hand to compute Req{Z). Obviously, other 
constraints and branching decisions must be taken into account when our procedure is 
being called next. Thus, we have to be able to transform our graph in such a way that 
from now on every path covers the new required nodes. At a first glance this sounds 
problematic, as an easy approach would delete all arcs going around the required nodes. 
This procedure would cause the resulting graph not to have the co-interval property 
anymore. 

We can force the shortest path to visit the required nodes by making them extremely 
cheap: Let Req C |0, . . . , n— 1} the set of (currently) required nodes. Further, let M ^ 
0 sufficiently larg^j. Then, we set Cj := Cj — M \/ j G Req, and Cj := Cj V j ^ Req. 
We use c instead of c as our objective and check whether the shortest path value is lower 
than Z — \Req\ ■ M. If not, either two required nodes overlap, or the shortest path value 
in the original graph exceeds Z. Moreover, by determining Rem{Z — \Req\ ■ M), we 
hnd all nodes that overlap with some required node plus all nodes that would cause the 
shortest path in the original graph to exceed the threshold Z. We conclude: 

Theorem 4. The VFSPP can be solved in time 0(n log n) or in amortized linear time 
for f?(log n) incremental calls of the variable fixing algorithm. 



4 Coupling Variable Fixing Algorithms via Lagrangian Relaxation 

An obvious approach to solve the ARP exactly is to apply a branch-and-bound algorithm 
using linear relaxation bounds for pruning and the existing VFAs for knapsack and 
MWSSP on interval graphs for the fixing of variables. 

Although the existing VFAs are efficient and effective for the substructures they have 
been designed for, their application for the ARP is not. This is because neither assuming 
an unlimited storage capacity nor ignoring the fact that only one movie can be recorded at 
a time yields a tight bound on the objective for the combined problem. But the accuracy 
of the upper bound is essential for the effectiveness of a VFA. An accurate bound can 
only be computed by looking at the entire problem, i.e., it cannot be achieved by looking 
at either one constraint family only. 

The linear relaxation bound can easily be obtained by applying a standard LP solver 
or by using specialized methods tailored for this specihc application. That bound yields 

^ Assuming that minigvjci} < 0, a valid setting for M is for example M := n • (1 + 
maxigvjci} — minigvjci}) 
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a good estimate on the performance that can (still) be reached. However, it is not straight 
forward to see how it could be exploited for the fixing of variables. Applying conven- 
tional reduced cost variable fixing techniques only indirectly exploits the structure of 
the problem and is therefore not effective enough, whereas to perform probing via full 
reoptimization is very costly and inefficient. 

Lagrangian relaxation (see e.g. (Jl for an introduction) allows us to bring together 
the advantages of a tight continuous global bound and the existing VFAs that exploit the 
special structure of their respective constraint families. As the stable set VFA allows us 
to incorporate changing objectives at a low computational cost, we decide to relax the 
capacity constraint. We introduce a non-negative Lagrange multiplier A > 0 and dehne 
the Lagrangian subproblem 

Maximize z(A) := z L{X) 

subject to z = J2o<i<n(P^ - ^Wi)xi + XK 

V0<p<m 

X e {0,1}'* 

The Lagrange multiplier problem then is to solve Minimize ^(A), such that A > 0. For 
every A > 0, z(A) is a valid upper bound on the objective. Therefore, we can apply the 
VFA for MWSSP on interval graphs each time we solve the Lagrangian subproblem. 
After we have found an optimal Lagrange multiplier A*, i.e. z(A*) < z(A) V A > 0, 
we can use the (optimal) dual information tt G IR”* from the corresponding stable set 
subproblem to perform variable fixing with respect to the knapsack substructure now. 
By Lagrange relaxing the maximal clique constraints with multipliers tt > 0, we obtain 
a knapsack problem: 

Let := 7Tj V 0 < i < n and n := Yo<j<m - problem then is to 

Maximize Yo<i<n(Pi + ~ ^ 

subject to Lo<i<n ^ ^ 

X G {0,1}” 

Relaxations of this problem again yield an upper bound on the objective, and we can 
apply the knapsack VFA from 0|. 

In general, two linear optimization constraint families for which efficient VFAs are 
known can be combined effectively by computing Lagrangian multipliers for the first, 
using the second for fixing variables in each Lagrangian subproblem L(A), and then 
handing back dual information of the optimal L{X*) to fix variables with respect to 
the first constraint family with the corresponding (optimal) reduced cost objective. This 
procedure even strengthens the bound on the objective, as variable fixing is also done 
during the bound computation. However, if variables are being fixed during the process 
of finding optimal Lagrange multipliers, the algorithm that solves the Lagrangian dual 
must be aware of this. It is subject to further research, how e.g. subgradient methods 
must be adapted to be able to cope with that situation. 

5 Numerical Results 

We used four different approaches for our experiments: the first is a pure branch-and- 
bound algorithm without any variable fixing (referred to as no fixing (F-0)). The second 
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uses the VFAs for knapsack and MSSP on the original objective (fixing 1 (F-lj). The 
third and the fourth approach (fixing 2 (F-2) and fixing 3 (F-3)) realize the idea of 
coupling VFAs for linear optimization problems via Lagrangian relaxation. F-2 calls for 
the fixing of variables just once after the Lagrangian dual has been solved, whereas F-3 
also performs maximum weighted stable set variable fixing during the search for optimal 
Lagrange multipliers. For details on the computation of the Lagrange multipliers and 
the choice of the branching variable, we refer to I fill . 



5.1 Experiments 

All experiments were performed on a PC with an AMD-Athlon 600 processor and 
256 MB ram running Linux 2.2. The implementation was done in C++ and compiled by 
gcc 2.95. The algorithms were built on top of Ilog Solver 5.0 M- 

The ARP test instances were generated by specifying the time horizon and the number 
of channels. The generator sequentially fills the channels by starting each new movie 
one minute after the last. First, a class for the next movie is being chosen randomly. 
That class then determines the intervals from which the length and the profit are chosen 
randomly (for now, we have only been using 3 different classes of movies). The disc space 
necessary to store each movie equals its length, and the storage capacity is randomly 
chosen as 40%-60% of the entire time horizon. 

The experiment consists of 50 random instances per test set. For each instance, the 
approaches F-0 - F-3 were run to find and prove an optimal solution. The minutes for 
the time horizon equal 6 hours, 12 hours, one day, 3 days, and 5 days of digital television, 
respectively. 

It should be noted that all approaches find a first solution rather early in the search. 
Therefore, the main work lies in the proof of optimality rather than in the construction 
of the solution. We conclude that the branching variable selection we used efficiently 
supports finding near-optimal solutions in a non-exhaustive search. 

Obviously, the linear continuous bound we use for pruning is rather tight. Except for 
the 4320-50 and 7200-20 instances, even F-0 is able to cope with all problem classes 
fairly well. Thus, it is justified to base pruning and also variable fixing on that bound. 

The positive effect of applying a VFA can be seen when comparing F-0 and F-1. 
Although the latter is not very effective, it is already able to reduce the number of choice 
points by a factor of 1.5 - 2. This results in a speed-up of up to 3 for the 7200-20 
instances. 

F-2 and F-3 yield a dramatic further reduction of choice points. For the 7200-20 
instances, F-3 only needs 10% of the choice points of F-1. For the 4320-50 instances, 
it even reduces the number of choice points to 7% compared to F-1, and to 3% with 
respect to the number of choice points visited by F-0. 

The gain in time is not that drastic, as the greater effectivity must be paid for by a 
higher computational effort per search node. Thus, the maximum speed-up we get be- 
tween F-0 and F-2 is “only” 10-11. And although F-3 is more effective and outperforms 
F-2 regarding the number of choice points, it is slower on all instances. The reason for 
both is that it performs maximum weighted stable set variable fixing for each Lagrangian 
subproblem. Of course, if the absolute time spent per choice point was higher (due to 
other variable fixing algorithms or more costly bound computations for example), the 
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Table 1. Numerical results for the ARP. The first three columns characterize each group of 50 
instances by giving the planning horizon in minutes, the number of channels, and the average 
number of movies. Columns 4 to 7 present running times for the different approaches. In brackets, 
we give the number of search nodes, in the following referred to as choice points. 



Test set runtime in sec. (choice points) 



min. 


#ch 0mov. 


i70 fixing F-0 


fixing F-1 


fixing F-2 


fixing F-3 


360 


5 29.0 


0.1 


(27.2) 


0.1 


(15.6) 


0.0 


(14.6) 


0.0 


(10.9) 


360 


20 115.6 


0.4 


(49.9) 


0.2 


(32.3) 


0.1 


(24.2) 


0.2 


(17.3) 


360 


50 285.2 


1.1 


(52.5) 


0.8 


(37.3) 


0.3 


(25.7) 


0.5 


(22.1) 


360 100 569.0 


2.9 


(96.7) 


2.1 


(67.9) 


0.8 


(36.7) 


1.2 


(30.0) 


720 


5 52.4 


0.1 


(69.3) 


0.1 


(33.3) 


0.1 


(26.0) 


0.1 


(17.6) 


720 


20 208.3 


0.6 


(91.7) 


0.4 


(69.0) 


0.3 


(36.5) 


0.4 


(29.4) 


720 


50 523.4 


4.2 


(305.5) 


2.4 


(166.6) 


1.3 


(58.7) 


2.0 


(52.3) 


720 100 1048.3 


18.8 


(680.8) 


9.1 


(326.3) 


5.1 


(108.6) 


7.6 


(95.5) 


1440 


5 101.4 


0.8 


(130.4) 


0.7 


(98.2) 


0.2 


(48.0) 


0.4 


(38.5) 


1440 


20 402.9 


7.9 


(479.6) 


5.3 


(306.1) 


1.6 


(103.0) 


2.5 


(81.9) 


1440 


50 1013.4 


37.8 


(1461.1) 


19.5 


(734.4) 


11.1 


(187.9) 


13.8 


(178.5) 


1440 100 2019.8 


291.0 


(5351.2) 


108.2 


(2028.1) 


55.6 


(530.2) 


71.0 


(439.9) 


4320 


5 291.7 


A.l 


(508.2) 


6.0 


(327.3) 


2.0 


(138.9) 


3.1 


(121.6) 


4320 


20 1175.0 


177.4 


(3003.1) 


111.7 


(1897.2) 


36.6 


(468.4) 


42.9 


(401.2) 


4320 


50 2949.7 


12227.8 


(149877.2) 


5647.5 


(67864.5) 


1191.5 


(4744.5) 


1197.3 


(4703.5) 


7200 


5 487.8 


64.0 


(3272.2) 


43.5 


(2705.4) 


17.5 


(878.0) 


17.9 


(434.5) 


7200 


20 1956.7 


5210.3 


(60839.2) 


1734.4 


(30676.4) 


455.9 


(3433.9) 


490.1 


(2945.1) 



increased effectivity should also result in a faster overall computation. By using a pa- 
rameter ( that determines the frequency of maximum weighted stable set variable fixing, 
we can trade time for effectivity. Like that, we can tune our coupled VFA towards our 
specific application. To achieve a most effective filtering algorithm, the VFAs for both 
substructures could be applied for each Lagrangian subproblem. 

A topic of further investigation is the comparison of test sets with an equal number 
of movies: E.g., instances in 720-100, 1440-50, and 4320-20 all contain about 1000 
movies, and the solution time decreases with the number of channels increasing. In a 
test set with a high number of channels, many nodes are overlapping, hence the maximum 
weighted stable set VFA can easily eliminate many variables from consideration. With 
a lower number of channels and a wider time horizon, the knapsack constraint gains 
more importance. We assume that a vice versa approach (i.e., Lagrange relaxing the 
non-overlapping constraints and solving a knapsack as subproblem) would be a suitable 
choice for such a situation. 



6 Conclusions and Future Work 

For the automatic recording problem, we exemplary introduced the idea of coupling 
VFAs via Lagrangian relaxation. It allows to combine existing variable fixing routines 
for linear optimization problems to obtain effective and efficient filtering algorithms 
based on tight global bounds. We believe, that this idea is generic and independent of 
the specific application we presented to base an empirical evaluation on. The numerical 
results show a significant improvement due to the coupling method with respect to the 
computation time and other algorithmic measures such as search nodes. The method 
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is suited for linear optimization problems for which bounds based on continuous or 
Lagrangian relaxations can be used effectively. 

In order to be able to couple VFAs for knapsack and MWSSPs, we first had to 
develop a VFA for the latter. Although for the problem itself algorithms running in time 
0(nlogn) have been developed before, they did not provide dual information that is 
needed for the coupling method we introduced. Moreover, to our knowledge no VFA 
for the MWSSP on interval graphs running in amortized linear time existed before. 
As the problem occurs as a substructure in many optimization problems, especially in 
scheduling contexts, the VFA we developed in our view is a contribution that is of general 
interest and goes beyond the application we presented. 

For the multimedia application we introduced, we developed a refined IP formulation. 
The continuous relaxation of that IP yields a tight upper bound as our experiments 
showed. Several extensions are possible for that application. A digital video recorder 
could have more than one recording unit which allows the recording of a limited number 
of channels simultaneously. In an IP context, this modification can be introduced easily. 
For the new exact approach presented, a fast and efficient VFA for this type of relaxed 
non-overlapping constraint is subject to further research. 

Acknowledgement. We would like to thank Burkhard Monien for helpful comments. 
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Abstract. This work presents approximation algorithms for scheduling 
the tasks of a parallel application that are subject to precedence con- 
straints. The considered tasks are malleable which means that they may 
be executed on a varying number of processors in parallel. The consid- 
ered objective criterion is the makespan, i.e., the largest task completion 
time. 

We demonstrate a close relationship between this scheduling problem 
and one of its subproblems, the allotment problem. By exploiting this 
relationship, we design a polynomial time approximation algorithm with 
performance guarantee arbitrarily close to (3 -I- -\/5)/2 « 2.61803 for the 
special case of series parallel precedence constraints and for the special 
case of precedence constraints of bounded width. These special cases 
cover the important situation of tree structured precedence constraints. 

For the general case with arbitrary precedence constraints, we give a 
polynomial time approximation algorithm with performance guarantee 
3 -b v^5 « 5.23606. 

1 Introduction 

Scheduling and load-balancing are central issues in the parallelization of large 
scale applications. One of the main problems in this area concerns efficient 
scheduling of the tasks of a parallel program. This problem asks to determine at 
what time and on which processor all the tasks should be executed. Among the 
various possible approaches, the most commonly used is to consider the tasks of 
the program at the finest level of granularity, and to apply some adequate cluster- 
ing heuristics for reducing the relative communication overhead; see Gerasoulis 
& Yang jEj. Several models have been developed for modeling the communica- 
tion and the parallelization overhead in these problems. In models with a finer 
communication representation (like in the LogP model j3|), the impact of the 
parallelization overhead is usually ignored. 

Recently, a new computational model called Malleable tasks (MT) has been 
proposed by Turek, Wolf & Yu H31 as an alternative to the usual delay model. 
Malleable tasks are computational units which may be themselves executed in 
parallel. The influence of communications is taken into account implicitely by a 
penalty factor which may be determined more or less precisely for each applica- 
tion; cf. Blayo, Debreu, Mounie & Trystram p. As the granularity in the MT 
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model is large, the communications between the malleable tasks are usually ne- 
glected. We refer the reader to Lepere, Mounie, Robic & Trystram cn for more 
details and for motivations of the MT model. MT are closely related to two other 
models, namely to the model of multiprocessor tasks (see e.g. Drozdowski 0) 
and to the model of divisible tasks (Prasanna & Musicus |E|). The difference 
between these models lies in the freedom allowed to the task allotment, that 
is, the number of processors which execute each task: A multiprocessor task re- 
quires to be executed by a fixed integer number of processors, whereas divisible 
tasks share the processors as a continuously divisible resource. 



1.1 The Malleable Tasks Model 

Throughout this paper we assume that the parallel program is represented by a 
set of generic malleable tasks, that is, computational units that may be paral- 
lelized and that are linked by precedence constraints. The precedence constraints 
are determined a priori by the analysis of the data flow between the tasks. More 
formally, let G = (V, E) be a directed graph where V = {1, 2, . . . , n} represents 
the set of malleable tasks, and where E CV xV represents the set of precedence 
constraints among the tasks. If there is an arc from task i to task j in E, then 
task i must be processed completely before task j can begin its execution. This 
situation will be denoted by i — >■ j; i is called a predecessor of j, and j is called 
a successor of i. All tasks are available at time 0 for execution, and they are to 
be scheduled on an overall number of m processors. Every task j is specified by 
m positive integers pj^q (1 < g < to) where pj^q denotes the execution time of 
task j when it is executed in parallel on q processors. 

Motivated by the usual behavior of parallel programs (cf. Cosnard & Trys- 
tram [21), we make the following assumptions on the task execution times. Blayo, 
Debreu, Mounie & Trystram have shown these assumptions to be realistic 
while implementing actual parallel applications. 

Assumption 1 (Monotonous penalty assumptions) 

(a) The execution time pj^q of a malleable task) is a non-increasing function of 
the number q of processors executing the task. 

(b) The work Wj^q = q ■ Pj,q of a malleable task j is a non- decreasing function of 
the number q of processors executing the task. 

Assumption (a) means that adding some processors for executing a malleable 
task cannot increase its execution time. In practice, the execution time even goes 
down in this situation, at least until a threshold from which onwards there is 
no more parallelism. Assumption (b) reflects that the total overhead for man- 
aging and administrating the parallelism usually increases with the number of 
processors. 

A schedule a is specified by two functions starts. : E — >■ IN and alloG : V — > 
[1,to] where the function starts associates to each task a date of execution (or 
starting time), and where the function allots specifies the number of processors 
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to execute a task. In schedule cr, the task j completes at time Ca-(j) = startcr{j) + 
Pj,aUot„(j)- We say that task j is active during the time interval from start^{j) 
to Ccr{j), and we denote by active{t) the set of all tasks that are active at time 
t. A schedule cr is a feasible schedule, if at any moment t in time at most m 
processors are engaged in the computation 

allots (j) < m for all t > 0, 

jGactive{t) 

and if all the precedence constraints are respected: 

starta{i) + Pi,aUot,(i) < starU(j) for all i j. 

The makespan Cmax of a schedule a is the maximum of all task completion times 
We now introduce the central problem of this paper. 

Problem 1. MAKESPAN FOR MALLEABLE TASKS (Mt-Makespan) 
Instance: A directed graph G = (V, E) that represents a set of n precedence 
constrained malleable tasks; the number m of processors; positive integers pj^q 
with 1 < j < n and 1 < g < m that specify the task execution times. 

Goal: Find a feasible schedule that minimizes the makespan Cmax- 

Consider an instance of Mt-Makespan, and assume that some processor allot- 
ment a has been prespecified for all tasks, and that task j is to be executed on 
exactly aj processors. Then the execution time of task j is pj, and its work is 
ajPj. With every directed path through the precedence graph G = (V,E), we 
associate the total execution time pj of the vertices on this path. The longest 
path under this definition of length is called the critical path of the allotment 
a, and its length is denoted by L°‘. Moreover, we denote by ^jPj 

the overall work in allotment a. Clearly, 

c{a) = max{L“, -W“} < C„,ax (1) 

m 

holds for the makespan Cmax of any feasible schedule a under allotment a: Since 
the schedule must obey the precedence constraints, the tasks along the critical 
path form a chain that forces the makespan to at least L“. Since the total work 
W°‘ can only be distributed across m processors, some processor will run for at 
least W°‘ Im time units. The value c{a) in equation m will be called the cost of 
allotment a. With this discussion, it is fairly natural to consider the following 
auxiliary problem. 

Problem 2. ALLOTMENT FOR MALLEABLE TASKS (Mt-Allotment) 
Instance: A directed graph G = (V, E) that represents a set of n precedence 
constrained malleable tasks; the number m of processors; positive integers pj^q 
with 1 < j < n and 1 < g < m that specify the task execution times. 

Goal: Find an allotment a : K — >■ [l,m] that minimizes the cost c{a). 
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1.2 Known Results 



The complexity of the makespan problem for malleable tasks has been studied 
in the paper of Du and Leung The problem with arbitrary precedence con- 
straints is strongly NP-hard for m = 2 processors, and the problem of scheduling 
independent malleable tasks is strongly NP-hard for m = 5 processors. 

Only a few positive results are available for scheduling malleable tasks. 
Prasanna & Musicus H2| proposed an algorithm for some specially structured 
precedence task graphs for the so-called continuous version of the problem; in 
the continuous version, a non-integer number of processors may be alloted to 
any task. Moreover, they assume the same speed-up function for all tasks. The 
results of Lenstra & Rinnooy Kan nm for makespan minimization of precedence 
constrained sequential tasks imply that unless P=NP, makespan minimization 
of precedence constrained malleable tasks cannot have a polynomial time ap- 
proximation algorithm with worst case performance guarantee better than 4/3. 

The ALLOTMENT PROBLEM FOR MALLEABLE TASKS is closely re- 
lated to the discrete time-cost tradeoff problem, a well-known problem from the 
project management literature; see e.g. De, Dunne, Ghosh & Wells 0. The dis- 
crete time-cost tradeoff problem is a bicriteria problem for projects, where a 
project essentially is a system of precedence constrained tasks. Every task may 
be executed according to several different alternatives, where each alternative 
takes a certain amount of time and costs a certain amount of money. By select- 
ing one alternative for every task, one fixes the cost (= total cost of all tasks) 
and the duration (= length of the longest chain) of the project. In the budget 
variant of the discrete time-cost tradeoff problem, the instance consists of such 
a project together with a cost bound C. The goal is to select alternatives for all 
tasks such that the project duration is minimized subject to the condition that 
the project cost is at most C; the corresponding optimal duration is denoted by 
D*{C). By rounding the solutions of a linear programming relaxation, Skutella 
m derives a polynomial time algorithm for this budget variant that finds a 
solution with project cost at most 2C and project duration at most 2D*{C). 

Now let us discuss the connection between the allotment problem Mt- 
Allotment and the discrete time-cost tradeoff problem. In the allotment prob- 
lem Mt- Allotment, every task j can be executed in m alternative ways by 
assigning aj machines to it, where 1 < aj < m. In the language of the dis- 
crete time-cost tradeoff problem, the resulting duration of task j is Pj,aj and the 
resulting cost of task j is /m, i.e., its contribution to the value TlP“. 

Then the corresponding project cost equals ^14/“, the corresponding project 
duration equals L“, and the maximum of these two values equals the cost c(a) 
of allotment a. By combining the above mentioned result of Skutella m with 
a binary search procedure, we now get the following proposition. 



Proposition 1. The ALLOTMENT PROBLEM EOR MALLEABLE TASKS 
possesses a polynomial time 2-approximation algorithm. ■ 
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We furthermore note that the arguments of De, Dunne, Ghosh & Wells |H] im- 
ply that the ALLOTMENT PROBLEM FOR MALLEABLE TASKS is NP- 
complete in the strong sense. 



2 Results and Outline of the Paper 

We stress that all results in this paper are based on the monotonous penalty As- 
sumption ^ In this paper, we derive polynomial time approximation algorithms 
for various cases of the MAKESPAN PROBLEM FOR MALLEABLE TASKS 
and of the ALLOTMENT PROBLEM FOR MALLEABLE TASKS. Let us first 
define for m > 3 the real numbers r(m) by 



( m 2m — /r 

rim) = mm max < — , 

i<r-<(rn+i)/2 /i m — ^ + 1 

Moreover, let fJ,{m) be the integer n with 1 < < {m+ l)/2 for which this min- 

imum is attained. The following lemma provides the reader with some intuition 
on the (somewhat erratic) behaviour of the values r(m) and ^(m). For small m, 
the values of ^(m) and r(m) are listed in Figure ^ 

Lemma 1. The real numbers r(m) and the integers fJ-(jn) satisfy the following 
properties. 

(i) For all m>2, we have r{m) < (3 -I- \/5)/2 k. 2.61803. 

(ii) As m tends to infinity, r{m) tends to (3 -I- \/5)/2. 

(Hi) For every m > 2, the value p,{m) either equals the integer above or the 
integer below |(3m — \/hm? 4m) . 

(iv) For every m>2 with m 3 and to 5, we have p,{m) < m/2. 

(v) As TO tends to infinity, p,{m)/m tends to (3 — \/5)/2 Ri 0.38196. ■ 




m 


g{m 


1 r(m) 


m 


fi{m) 


r(m) 


m 


y{m] 


r(m) 


m 


y{m] 


r(m) 


2 


1 


2.0000 


10 


4 


2.5000 


18 


8 


2.5454 


26 


10 


2.5625 


3 


2 


2.0000 


11 


5 


2.4285 


19 


8 


2.5000 


27 


11 


2.5294 


4 


2 


2.0000 


12 


5 


2.4000 


20 


8 


2.5000 


28 


11 


2.5454 


5 


3 


2.3333 


13 


6 


2.5000 


21 


9 


2.5384 


29 


12 


2.5555 


6 


3 


2.2500 


14 


6 


2.4444 


22 


9 


2.5000 


30 


12 


2.5263 


7 


3 


2.3333 


15 


6 


2.5000 


23 


9 


2.5555 


31 


13 


2.5789 


8 


4 


2.4000 


16 


7 


2.5000 


24 


10 


2.5333 


32 


13 


2.5500 


9 


4 


2.3333 


17 


7 


2.4545 


25 


10 


2.5000 


33 


13 


2.5384 



Fig. 1. A listing of the values p.{m) and r(m) for 2 < m < 33. 



Approximation Algorithms for Scheduling Malleable Tasks 151 



The straightforward proof of Lemma Q is omitted. The following theorem sum- 
marizes our structural main result on the problems Mt-Makespan and Mt- 
Allotment; its proofs can be found in Section |3 The theorem demonstrates 
that these two problems are strongly interlocked and interrelated. Moreover, up 
to some small constant factor it is sufficient to deal with the approximability of 
the - seemingly easier - problem Mt- Allotment. 

Theorem 2 . If there exists a polynomial time g- approximation algorithm A for 
problem Mt- ALLOTMENT on m processors, then there exists a polynomial time 
g-r{m) -approximation algorithm B for problem Mt-Makespan on m processors. 

An immediate consequence of Proposition H Theorem El and Lemma^i) is the 
following corollary. 

Corollary 1 . The MAKESPAN PROBLEM FOR MALLEABLE TASKS pos- 
sesses a polynomial time approximation algorithm with performance guarantee 
3 -b\/5a: 5.23606. ■ 

The following Theorem 0 will be a strong and helpful tool for handling specially 
structured precedence constraints. Its proof can be found in Section 0 

Theorem 3 . Consider the decision version of problem Mt- Allotment where 
for a given instance I of Mt- Allotment and for a positive integer bound X , 
one must decide whether there exists an allocation of cost at most X. If there 
exists a pseudo-polynomial time exact algorithm for this decision version with 
running time polynomially bounded in the size of I and in the value of X, then 
there does exist a fully polynomial time approximation scheme for problem Mt- 
Allotment. 

A directed precedence graph G = (V, E) is series parallel if (i) it is a single 
vertex, (ii) it is the series composition of two series parallel graphs, or (iii) it is 
the parallel composition of two series parallel graphs. Only graphs that can be 
constructed via rules (i)-(iii) are series parallel. Here the series composition of 
two directed graphs Gi = {V\,Ei) and G2 = (V2, A2) with V1AV2 — % is the 
graph that results from Gi and G2 by making all vertices in Vi predecessors of 
all vertices in V2) whereas the parallel composition of Gi and G2 simply is their 
disjoint union. Series parallel precedence constraints are a proper generalization 
of tree precedence constraints. We have the following result for series parallel 
precedence constraints. 

Theorem 4 . There exists a pseudo-polynomial time exact algorithm for the de- 
cision version of the restriction of problem Mt- Allotment to series parallel 
precedence graphs. 

Two tasks i and j are called independent if neither i is a predecessor of j nor j 
is a predecessor of i. A set of tasks is independent, if the tasks in it are pairwise 
independent. The width of the precedence graph G is the cardinality of its largest 
independent set. We have the following result for precedence graphs of bounded 
width. 
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Theorem 5. There exists a pseudo-polynomial time exact algorithm for the de- 
cision version of the restriction of problem Mt- Allotment to precedence graphs 
whose width is bounded by a constant d. 

The proofs of Theorems 0 and 0 are sketched in Section El Finally, by combining 
the statements in Theorems |2] El 0 and 0 we derive the following corollary. 

Corollary 2. For the restriction of the MAKESPAN PROBLEM EOR MAL- 
LEABLE TASKS to (a) series parallel precedence graphs and to (b) precedence 
graphs of bounded width, there exist polynomial time approximation algorithms 
whose performance guarantee can be made arbitrarily close to (3 + \/ b ) l 2 . ■ 

3 Prom Allotments to Makespans 

In this section we will prove Theorem 0 Consider an instance / of the malleable 
tasks problem as defined in Problems 0and|2 Consider an optimal allotment a~^ 
and a ^-approximate allotment for instance / with respect to problem Mt- 
Allotment. Denote by W ~^ and the total work in these two allotments, 
and by L'^ and L^ the lengths of their critical paths, respectively. Since is a 
^-approximate allotment, we have 

maxlL'^, — W ^} < g ■ ma , x { L + , — 1T+}. (3) 

m m 

Moreover, consider an optimal feasible schedule for instance I with respect to 
problem Mt-Makespan, and let denote the optimal makespan. By apply- 
ing equation m to C^ax to the allotment induced by the optimal schedule, 
and by using the fact that minimizes the allotment cost, we get that 

max{L+, ilF+} < C;^ax- (4) 

We will now define and analyze an approximation algorithm B for problem 
Mt-Makespan. This approximation algorithm is based on the value pirn) with 
1 < pirn) < im -\- l)/2 as we defined in the paragraph after equation 021). 
To simplify the presentation, we will from now on briefly write p for pirn), and 
omit the dependence on m. Algorithm B is a generalization of Graham’s jOj well- 
known list scheduling algorithm for sequential tasks. The algorithm is described 
in Figure 0 The resulting schedule is denoted , the corresponding makespan 
is the underlying allotment is , the total work in is , and the 

length of the critical path in is L^ . The only difference between allotments 
and is that the tasks using more than p processors in are compressed 
to p processors in . By the monotonous penalty Assumption mb), reducing 
the number of alloted processors cannot increase the work of a task. Together 
with inequalities m and @ this yields 
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The time interval from 0 to is partitioned into three types of time slots: 
During the first type of time slot, at most /t — 1 processors are busy. During the 
second type, at least /t and at most m — /i processors are busy, and during the 
third type at least m — fi+ I processors are busy. The corresponding sets of time 
slots are denoted by Ti, T 2 , and T3, respectively. The overall length of the time 
slots in set Ti, 1 < i < 3, is denoted by |Ti|. If /r < m/2, then every time slot 
from 0 to C,^ax belongs to exactly one of the three types, and all three types 
may actually occur. In the boundary case where /i = (m + l)/2 every time slot 
from 0 to either belongs to the first or to the third type. In this boundary 
case there are no time slots of second type, since this would require that at least 
(m+ 1) /2 and at most (m— 1) /2 processors are busy, which clearly is impossible. 
Since in either case these three types of time slots cover the whole interval from 
0 to C,^axi gst that 



C^^ax = |ri| + |T2| + |T3|. (6) 

Since during time slots of the first (respectively second and third) type at least 
one (respectively /r and m — /i + 1) processors are busy, we get that 

> \Ti\ + fi\T 2 \ + (m - + 1)\T3\. (7) 



1. Initialization. 

“ Allot to task j (j = 1, . . . , n) exactly af = min{af , fi} processors. 

- This fixes the execution time pf and the work wf — af -pf of every task j. 

2. Repeat the following step until all tasks have been scheduled. 

- Let Ready denote the set of tasks whose predecessors all have already been 
scheduled. 

- Compute for each task j G Ready the earliest possible start time under the 
allotment . 

- Schedule the task in Ready with the smallest computed earliest start time 
(ties are broken in favor of tasks with smaller indices). 

Fig. 2. Approximation algorithm B for problem Mt-Makespan. 



Lemma 2. The sets Ti and T 2 of time slots satisfy the following inequality with 
respect to the length of the critical path in allotment . 

|Ti| + ^|T2| < L^. (8) 

m 

Proof. The idea is to construct a ‘heavy’ directed path V in the transitive 
closure of the graph G = (V, E). The last task in the path V is any multiprocessor 
task ji that completes at time C^^x the schedule . After we have defined 
the last i > 1 tasks ji — >■ ji_i — f • • • — f J2 — f ji on the path P, we find the next 
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task ji+i as follows: Consider the latest time slot t in Ti U T 2 that lies before 
the starting time of task ji in . Consider the set V' of tasks that consists of 
task ji and of all its predecessor tasks that start after time t in tr® . Since during 
time slot t at most m — n processors are busy, and since tr® allots at most ^ 
processors to any task in V , all the tasks in V cannot be ready for execution 
during the time slot t. Hence, for every task in V some predecessor is being 
executed during the time slot t. As the next task A+i on path P, we select any 
predecessor of task ji that is running during slot t. This procedure terminates 
when V contains a task that starts before all time slots in Ti U T 2 . 

Now consider a task j on the resulting path V. If allots less than /x 
processors to task j, then and both allot the same number of processors 
to j . In this case the execution times of j in and are identical. In schedule 
aB such a task j may be executed during any time slot in T\ U T 2 . If allots 
exactly /x processors to task j, then a.^ may allot any number k of processors 
to j, where fi < k < m. By the monotonous penalty Assumption QJb), the work 
pf in is less or equal to the work k ■ in a^. Therefore, the execution 
time p^ of task j in allotment is at least p/k > p/m times the execution 
time p^ of j in allotment . In schedule as such a task j may be executed 
during any time slot in T 2 , but not during a time slot in Ti. 

By our construction, the tasks on the directed path V cover all time slots in 
T\ U T 2 in schedule as- Let us estimate the length L^{V) of the path V under 
the allotment a^. The tasks that are executed during time slots in Ti contribute 
a total length of at least |Ti| to L^{V). The tasks that are executed during time 
slots in T 2 contribute a total length of at least \T 2 \p/m to L^{V). Since the 
length of the critical path in is an upper bound on L^{V), our proof is 
complete. ■ 

Now let us complete the proof of Theorem El Multiplying (jOJ by m — ^ + 1 and 
subtracting (Q from it yields 

(m-/x+l)C^ax < H^'' + (m-M)|Ti| + (m-2^+l)|r2|. (9) 

We distinguish two cases. In the first case we assume that m/p < {2m — p)/{m — 
p + 1). Then (EJ yields r(m) = (2m — p)/{m — p + 1). Moreover, the assumed 
inequality is equivalent to (m — 2^ + 1) < p{m — p)/m. Plugging this into (El), 
using (0) to bound |Ti| + ^|T 2 |/m, using (EJ to bound , and using ( 0 ) and 
to bound by g alltogether yields that 

{m- p+l)C//^^ < + {m- p)\Ti\+ p{m- p)\T 2 \/m 

< + {m- p)L^ 

<mgC/^^^ + {m- p)gC/^^^ = {2m - p) g 

Hence, in this case schedule indeed yields a g- r(m)-approximation for 
In the second case we assume that the inequality m/p > (2m — p)/{m — p + 
1) holds. Then (EJ yields r{m) = m/p. Moreover, the assumed inequality is 
equivalent to {m — p) < {m — 2p + l)m/ p. By plugging this into (El) and by using 
similar arguments as in the first case, we conclude that 



Approximation Algorithms for Scheduling Malleable Tasks 155 



(m - ^ + l)C'^ax < + (to - 2/r + 1 )to|Ti|/^ + (to - 2^ + 1 )|T 2 | 

< + (to — 2/r + 1 )toL"^//t 

< mg + (to - 2/x + l)mg fi 

= {m- g,+ l)mgC^^^/g,. 

Hence, also in the second case schedule yields a g ■ r(TO)-approximation for 
The proof of Theorem |2 is complete. 

4 Prom a Pseudo-Polynomial Time Algorithm to an 
FPTAS 

In this section we will prove Theorem 0 Our first goal is to get a fast algorithm 
for the following auxiliary allotment problem Mt- Allotment on series parallel 
precedence graphs: We assume that we are given an instance I of Mt- Allot- 
ment, a positive real e, and an a priori bound X such that there exists an 
allotment for I with cost at most X . Our goal is to find within polynomial time 
an allotment a that satisfies c{a) < (1 -I- e)A. 

Define Z = eXjn. Furthermore, define a scaled instance /' by setting p' ^ = 
[pj^q/Z\ for all tasks j and all 1 < g < to while keeping the same precedence 
constraints as in instance I. Note that pj^q < Z{pj^ + 1). Moreover, note that 
instance I' must have an allotment of cost at most XjZ^ since the original 
instance / had some allotment of cost at most X. Take the pseudo-polynomial 
time algorithm that exists according to the assumption of Theorem 01 and apply 
it to the scaled instance /' with bound \XjZ\. Denote the resulting allotment 
by a with c(a) < \XjZ\, and interprete allotment a for V as an allotment j3 for 
the original instance I. Consider an arbitrary path V with |P| tasks in allotment 
fj. Then 

Y.PM) < E + 1) = ^I^I + ^EpE.) ^ Zn + ZL^^. (10) 

ieT iG-p iGP 

This implies < Z n + Z L'^. Moreover, 

^ E (PhaO) + - Zmn + ZW°‘. (11) 

jev 3&V 

This implies < Zmn + ZW°^. Putting things together we conclude that 
c(/9) = max{L^, — W^} < max{Z n + Z L°‘ , Zn + Z — W^} 

TO TO 

= Zn + Zc{a) < eX + Z{X/Z) = (l-be)A. 

Hence, the cost of allotment (3 for I is at most (1 -|- e)X as desired. By the 
assumption of Theorem0 the time to find /3 is polynomially bounded in the size 
of / and in XjZ = ne. To summarize, we can solve our auxiliary problem and 
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find the desired allotment within a running time that is polynomially bounded 
in the size of / and in 1/e. 

It remains to get rid of the assumption that we do have an a priori knowledge 
of the bound X. Let P — denote the total execution time of all tasks 

in I when they are executed on a single processor. By the monotonous penalty 
Assumption ^ every critical path in every allotment for I has length at most P, 
and also the average work of every allotment is at most P. Therefore, the cost 
of the optimal allotment is at most P, and we can find an (1 + e)-approximation 
by performing a binary search over the interval from 1 to P. This completes the 
proof of Theorem 0 

5 Allotments for Special Classes of Precedence 
Constraints 

In this section we will prove TheoremEl Hence, we are given an instance I of Mt- 
Allotment where the precedence graph G = {V, E) is series parallel, together 
with a positive integer bound X. Our goal is to decide within pseudo-polnomial 
time, whether there exists an allotment a with cost c{a) < X . 



1. Initialization of leaf vertices. 

- For every leaves v of the decomposition tree and for every £ with 0 < £ < A, 
set 

F[v,£] ■— mini< 5 <m{g • Pv,q \ Pv,q < £}■ 

2. Handling interior vertices of the decomposition tree. 

- For every interior vertex v with left child vi and right child V 2 and for every 
I with 0 < £ < A do the following: 

- If w is a p vertex, then F[v,£\ := F[vi,£\ + F[v2,£\ 

- If w is an s vertex, then A[n, £] := mini<j,<^_i F[vi,k] -\- F[v2,£ ~ 

3. Termination. 

- Answer YES if there exists some 1 < £ < A with E[roo£ £]/m < A. Other- 
wise, answer NO. 

Fig. 3. A dynamic programming algorithm for computing F[v,f]. 



It is well known that a series parallel graph can be decomposed in polynomial 
time into its atomic parts according to the series and parallel compositions. 
Essentially, such a decomposition corresponds to a rooted, ordered, binary tree 
where all interior vertices are labeled by s or p (series or parallel composition) 
and where all leaves correspond to single vertices of the precedence graph G. We 
associate with every interior vertex v of the decomposition tree the series parallel 
graph G{v) induced by the leaves of the subtree below v. Note that for the root 
vertex we have G{root) = G. For a vertex v in the decomposition tree, and for an 
integer £ with 1 < £ < A, we denote by F[v, £] the smallest possible value w with 
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the following property: There exists an allotment a for the tasks in G{v) with 
L°‘ < (. and 1T“ < w. It is easy to compute all such values F\v,f] by a dynamic 
programming approach that starts in the leaves of the decomposition tree, and 
then moves upwards towards the root. This algorithm is sketched in Figure 0 
The time complexity of this dynamic programming algorithm is 0{nmX^). 

This completes the proof of Theorem 2] Theorem 0 can be proved by a 
similar dynamic programming approach. The exact arguments are omitted from 
this extended abstract. 
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Abstract. The minimum test collection problem is defined as follows. 
Given a ground set S and a collection C of tests (subsets of 5), find the 
minimum subcollection C' of C such that for every pair of elements {x, y) 
in S there exists a test in C' that contains exactly one of x and y. It is 
well known that the greedy algorithm gives a 1 + 2 In n approximation 
for the test collection problem where n = |5|, the size of the ground set. 
In this paper, we show that this algorithm is close to the best possible, 
namely that there is no o(log n)-approximation algorithm for the test 
collection problem unless P = NP. 

We give approximation algorithms for this problem in the case when 
all the tests have a small cardinality, significantly improving the perfor- 
mance guarantee achievable by the greedy algorithm. In particular, for 
instances with test sizes at most k we derive an O(logfc) approximation. 
We show APX-hardness of the version with test sizes at most two, and 
present an approximation algorithm with ratio | + e for any fixed e > 0. 



1 Introduction and Motivation 

The test collection problem arises naturally in the following general setting of 
identification problems: Given a set of individuals (database entries) and a set 
of binary attributes that may or may not occur in each individual, the goal 
is to find the minimal subset of attributes (a test collection) such that each 
individual can be uniquely identified from the information on which of this subset 
of attributes it contains. In this way, the incidence vector of any individual with 
the test collection is a unique binary signature for it distinguishing it from other 
individuals, and thus uniquely identifying it from the list of individuals. This 
problem is also commonly known in the literature as the minimum test set 
problem m and the minimum test cover problem f] and arises commonly in 
fault analysis, medical diagnostics and pattern recognition (see, e.g., m)- 
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The test collection problem first came to our attention in the setting of pro- 
tein identification in computational biology |J|. A major thrust in post-genome 
biology is proteomics, where one analyzes all cellular proteins in a high through- 
put manner. We are concerned with the protein identification aspect of pro- 
teomics. 

Halldorsson, Minden and Ravi Pj proposed a new approach of using an array 
of antibodies that recognize and bind specifically to short peptide sequences 
(called epitopes) - such an epitope can distinguish proteins that contain this 
epitope from those that do not. Each antibody binds its cognate epitope by 
virtue of complementary three dimensional structure. To observe the occurrence 
of epitope binding, the epitopes will be fluorescently tagged. The binding of 
antibodies in the chip to the given unidentified protein can then be measured 
using a fluorescence detector. Thus the final output is a binary indicator vector 
of dimension equal to the number of antibodies in the chip, each bit indicating 
whether or not the protein is bound to the corresponding antibody. 

The proposal in | 7 | is to generate a set of antibodies that recognize a set 
of epitopes that are shared by many proteins in such a way that the entire set 
of epitopes covers all possible proteins in the organism’s proteome. Moreover, 
the set of antibodies will be selected such that each protein will be recognized 
by a unique subset of antibodies, i.e., each protein will have a unique signature 
of binding antibodies. This problem can hence be viewed as a test collection 
problem where the elements of the ground set are the proteins and there is 
a test for each antibody, determined by the binding of the antibodies to the 
proteins. 

The novelty of our approach is the utilization of antibodies that bind to 
a large subset of proteins. However, most currently known antibodies are very 
specific in their binding characteristics. Hence a special case of particular interest 
is one where each antibody binds only to a few of the proteins, i.e. when the test 
sets are small. In Section]^ we give two results for this case. 



Our Results 

The test collection problem is WP-hard, as was shown by Garey and John- 
son |Hj via a reduction from three-dimensional matching. By a natural reduction 
of the problem to the set covering problem (see e.g., |1 bj ) and an application of 
the results of Johnson dH or Lovasz m, the natural greedy algorithm gives a 
1 -|- 21nn approximation for the test collection problem, where n is the number 
of elements in the ground set. We show that the greedy algorithm has optimal 
approximation ratio up to a constant multiple unless P = NP. Motivated by 
moving towards a practical solution, we consider the case when all test sizes are 
at most k, and give a new algorithm with 0(log k) approximation guarantee, 
significantly improving the straightforward 0(log(n — k)k) approximation guar- 
antee of the greedy algorithm. We further consider the case where k = 2 and 
give a APX-hardness proof and an algorithm with approximation ratio g + e, 
for any e > 0. 
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Previous Work 

Subsequent to the acceptance of this extended abstract, we discovered that many 
of the results we present here have been obtained independently earlier. Moret 
and Shapiro m already described the crux of our reduction of a set covering 
problem to a test collection problem outlined in the proof of Theorem ^ Their 
result, combined with the hardness of approximation results on set covering |21 
MB already imply a logarithmic lower bound on the approximation ratio of 
minimum test collection (Theorems El and EJ • 

For the cases with small test sizes, we recently discovered that previous work 
by C.A.J. Hurkens, J. K. Lenstra and L. Stougie (unpublished, but documented 
in m - personal communication with L. Stougie, June 2001) obtained many 
of our approximation results independently including the following: an 0(log k)- 
approximation algorithm for problem instances with test sizes at most k (The- 
orem E) and for instances with test sizes at most two, (i) a ^-appoximation 
guarantee for the greedy algorithm, (ii) a sequence of improved heuristics em- 
ploying increasingly extensive local search: a case analysis for determining the 
approximation ratio of these heuristics using linear programming |3j shows that 
the fourth member in the family achieves an approximation ratio of | while the 
fifth one has an even better ratio. The approximation ratio of this sequence of 
heuristics is conjectured to converge to one. Our APX-hardness result of the 
problem with test sizes two shows that even if the conjecture held true, the 
trade-off between performance ratio and running time (the convergence “rate”) 
is not represented by a PTAS, unless P = NP 0. These authors also provide an 
NP-hardness proof for the case with test sizes at most two by a direct reduction 
from a two-path packing problem (We use a similar reduction in Algorithm EJ 
In Section El we give different formulations of the test collection problem and 
prove their equivalence. In Section 0 we show how the test collection problem 
relates to the set covering problem. In Section E| we show the hardness of ap- 
proximation of test collection. Finally, in Section Elwe prove our results for the 
case of small test sets. 

2 Formulations 

We give three equivalent formulations of the test collection problem. 
Definition 1. Given a collection C of subsets of a finite set S find a minimum 
subcollection C such that for each x,y € S there exists C € C' that contains 
exactly one of x and y. 

Definition 2. Given a complete graph on the node set S, and a set of cuts 
C = {Ci,C 2 , ■ • ■ , Cm}, find a minimum subcollection C' C C the union of whose 
edges is the entire edge set of the complete graph. 

Definition 3. Given a collection C of subsets of a finite set S find a minimum 
subcollection {Ci, (72, . . . , Cm} = C such that the vectors 

Ix = {6 {Ci,x),S{C2,x),. ■ ■ ,S{Cm,x)) 

where S(Ci,x) is one if x is in Ct and zero otherwise, are unique for all x € S. 
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The first definition allows for a natural reduction of the problem to the set 
covering problem. The second definition allows for a graphical representation of 
the tests as cuts in a graph. We will henceforth refer to the test C as the cut 
C as a shorthand for the cut C : {S — C), i.e. the set of edges with exactly one 
endpoint in C. 

The last definition relates best to our motivating example; Here we can think 
of S to be our set of proteins, Ci to be the set of proteins that contain the 
epitope (i.e. , those proteins that the antibody recognizes). 

The following lemmas are immediate. 

Lemma 1. All three Definitions^ and 0 of the test collection problem are 
equivalent. 

Lemma 2. The size of any solution to the test collection problem is at least 

[iog2 i5'n. 

Proof. We note by Definition 0 that all the elements of S must have a unique 
binary incidence pattern with the solution. Therefore if m is the size of a solution 
to a test collection problem 2™ > [S'! or m > |"log 2 |S'|]. □ 

A special case of the test collection problem as defined in Definition Elis one 
where any cut can be chosen in the given complete graph m- It is easy to show 
that this problem has a simple optimal solution of size |"log 2 n~\ , where n is the 
cardinality of the node set S (An interpretation of the bits in a binary labeling 
of the nodes in S as defining the shores of cuts gives this result) . 

3 Relation to Set Covering 

We first define the set covering problem. 

Definition 4. Given a ground set S and a collection of subsets {Ci, C 2 , ■ ■ ■ , Cm} 
= C find a minimum cardinality subcollection C ofC such that for all e G S there 
exists a C G C such that e G C. 

The following lemma is independently due to Johnson m and Lovasz 1111 - 

Lemma 3. The greedy algorithm gives a 1 + In fc approximation for the set cov- 
ering problem, where k is the size of the largest set. 

Considerable work has been done on the approximation of set covering. Lund 
and Yannakakis dS! showed that set covering cannot be approximated within 
clnn for any c < ^ unless NP C DTIME{nT°^y^'^'^). Feige ^ improved on their 
result and showed that set covering cannot be approximated within (1 — e) Inn 
unless NP C DTIME{n^°C°s'^f Finally, Arora and Sudan j2] showed that set 
cover cannot be approximated to within o(log n) unless P = NP. 

We note a natural reduction of test collection to set covering (see also m)- 
This reduction is depicted in Figure [D 

Lemma 4. The test collection problem is reducible to a set covering problem. 
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Fig. 1. Left figure shows a test collection instance and the right its reduction to a set 
covering problem. Each pair of elements in the test collection problem gets mapped to 
an element in the ground set of the set covering problem. A test set C gets mapped to 
a set that covers all the pairs in the cut C : (5 — C). 



Proof. Let Ctc^Stc be an instance of the test collection problem. We will now 
construct an instance Csc, Ssc of the set covering problem. For all unordered 
pairs of elements (x,y) € Stc, we define an element Cxy € Ssc- For each test 
Ctc G Ctc we construct a corresponding a set Csc G Csc by letting Cxy € Csc 
when exactly one of a; or y is in Ctc- It is not hard to verify that any solution to 
the resulting set cover problem is also a feasible solution to the test collection 
problem using the equivalence of Definitions ^ and 0 d 

Note the quadratic increase in the size of the ground set |5sc| of the resulting 
set cover problem as compared with |5tc|. If there are n elements in the original 
test collection problem there will be ( 2 ) elements its set covering formulation. 
Notice also that given a partial solution to the test collection problem, we can 
also map the residual problem to a set covering problem using the above method. 

As noted by Moret and Shapiro combined with Lemma 0 Lemma 0 
implies a 1 + 2 In n approximation guarantee for the greedy algorithm. 

Recall (Definition El) that the test collection problem can also be viewed as a 
form of cut-covering problem in a complete graph. Given a partial solution to the 
test collection problem, as in Definition El we can view the remaining problem on 
a graph whose node set is the elements of the test collection problem and edge 
set only the pairs of elements that have not been distinguished by the partial 
solution. The use of a partial solution leads us to generalize the test collection 
problem. 

Definition 5. Given a graph C = (V,E), and a set of cuts C = 

{Cl, C 2 , . . . , Cm}, find a minimum subcollection C' CC the union of whose edges 
is E. 

We note that this is a generalization as the original test collection problem in 
Definition El occurs as the special case when the edge set is the complete graph. 
As before, in the special case when the tests can be any subset of S (i.e., when 
all cuts in the graph are allowed in the cover), this problem is known simply 
as the cut cover problem m- Motwani and Naor HD have shown that the cut 
cover problem is NP-hard by showing a relation to the graph coloring problem. 
Using an algorithm of Halldorsson 0 they give an algorithm which has solution 
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which is no larger than the size of the optimal solution plus Inn — 3 In Inn. 
Using a hardness of approximation result of Lund and Yannakakis m for graph 
coloring they show that unless P — NP cut cover is hard to approximate within 
an additive elnn for some e > 0. Combining their result with the more resent 
work of Feige and Killian it can be shown that unless ZPP = NP no poly- 
time algorithm can approximate cut cover to any better than within an additive 
(1 — e) Inn for any e > 0. 

The NP-hardness of the test collection problem is shown in via a reduction 
from 3-dimensional matching. We now give a reduction of the test collection 
problem from the set covering problem, that will also be useful in showing a 
near-optimal approximation hardness results for the test collection problem. 

Theorem 1. The set covering problem is reducible in polynomial time to the 
test collection problem. 

Proof. Let Csc,Ssc be an instance of a set covering problem, where Ssc is the 
ground set and Csc is a collection of subsets of Ssc and let n = |S'sd- 

Let us now construct Ctc,Stc- Let there be two elements e;, G Stc for every 
element e S Ssc- Now arbitrarily assign distinct numbers from 1 through n to 
the elements of Sgc and construct the sets G\^ G Cjc, i & {1,2,... |"log 2 n] } where 
for every element e G Ssc both e; , e^. G if the binary representation of e has 
1 in the i-th bit. Furthermore for all sets G^^ G Csc construct the corresponding 
set G Ctc such that Cr G if and only if e G G^^. This reduction is depicted 
in Figure El 




2 3 




1 

tc 



Fig. 2. The reduction of set cover to test collection. Each node in the set covering 
instance is mapped to two nodes in the test collection instance and [log 2 n] new tests 
are added. 



In the previous construction we can think of the sets G\^ as forming a 
hypercube-like pattern, splitting the ground set into two approximately equal 
parts. If the cardinality of the original set covering ground set is n then there 
are [log 2 n] sets G\^\ The necessity of including them all then follows from hav- 
ing to distinguish the n e; elements from each other with these sets being the 
only ones in the collection that can accomplish this task. Clearly the sets G\^ 
are sufficient to distinguish between all the ei elements. 

Let be any set cover for SG and let C^c be the collection of tests Gtc 
where the test Ctc G Crc only if the set Csc € C'g(^. Then C'tc U 
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is a solution to the reduced test collection problem: The sets 
will distinguish between all pairs except for the (epe,.) pairs; Since the sets in 
Cgc cover all the elements in the ground set of SC, each element will be 
contained in least one of the tests in hence distinguishing it from its e/ 
neighbor. 

Now let be any test collection for a reduced set cover problem SC. By 
the necessity of including the sets C\^, note that C We can 

construct a solution to SC by letting G^^ £ C'gQ if and only if C^fj £ 
Crc \ Each element Cr is distinguished from its neighbor e/ in the 

test collection solution and hence is contained in one of the sets G/^ £ C'^q \ 
^ Gtc- The sets Cg(. therefore cover the ground set of SC. 

By the arguments in the two previous paragraphs we have the following 
lemma, relating the size of the reduction to the size of the original set cover 
problem. 

Lemma 5. An instance of a set covering problem has a solution of size k if and 
only if its reduction to a test collection problem described above has a solution 
of size k + |"log2 n] . 

It is easy to verify that all the steps in the reduction described above are 
polynomial-time implement able. 

□ 



4 Hardness of Approximating Test Collection 

In this section we will prove that test collection is as hard to approximate as the 
set covering problem. 

We will show an approximation hardness for test collection from the cor- 
responding hardness of approximating set cover. Let be a polynomial time 
algorithm that can approximate test collection. Let SC be an instance of the 
set covering problem and n be the number of elements in the ground set of SC . 
From Lemma 0 , we see that the size of the optimal solution to the test collection 
reduction of the set covering problem is [log2 n] larger than the optimal solu- 
tion to SC . Hence this reduction does not guarantee hardness of approximation 
directly. To translate the hardness, we make multiple copies of the original set 
covering problem, and apply the reduction to the copied instance to drive down 
the logarithmic additive error. 

Let A: be a positive integer. The following is our approximation algorithm 
for the set covering problem using a call to an approximation algorithm for test 
collection. 

Algorithm 1 TCtoSC{SC, (f, k) 

Multiply the original set covering problem k times to construct a set covering 
instance SCk. 

Reduce SCk to a test collection problem using the reduction in Lemma^ De- 
note this reduced problem TCk. 
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Solve TCk using the approximation algorithm (j) for the test collection problem. 
Construct a solution to SCk from the solution to TCk- 

Divide the solution of SCk iuto solutions, one for each of the individual copies 
of SC, and output the best (minimum size) solution among them. 

□ 



Observation 1 Algorithm^ runs in time polynomial in n and k. 

It follows that if fc is a polynomial function of n, Algorithm [D runs in time 
polynomial in n. 

We now prove a technical lemma to be used in our approximation hardness 
result. 

Lemma 6. Let <)> by an algorithm with performance guarantee A,j,{n), where 
n is the number of elements in the ground set of the test collection problem. 
The size of the solution returned by the algorithm TCtoSC{SC,(j),k) is at most 
Arj,{2kn) {^\log 2 kn~\ +Opt{SC)), where Opt{SC) denotes the size of the optimal 
solution to the set covering instance SC. 

The proof follows by observing that the test collection instance TCk has 2kn 
entries, applying LemmaEl and bounding the size of the output solution to SC 
by the average size of the solutions induced on the k copies. 

Theorem 2. Test collection is hard to approximate within o(log n) unless P = 
NP, where n is the number of elements in the ground set. 

Proof. Suppose there exists an algorithm (f that can approximate test collection 
within o(logn). Let SC be any instance of the set covering problem. Solve SC 
using TCtoSC{SC,4>, [Inn]). By Observation QJ TCtoSC runs in time polyno- 
mial in n. Using LemmaEland the performance guarantee definition, the size of 
the solution returned by TCtoSC is o(log(n[logn])) ■ ( +Opt{SC)) = 
o(logn • Opt{SC)) since Opt{SC) is a positive integer. This contradicts the re- 
sults of Arora and Sudan 0 giving the claim. □ 

A slight refinement of our main theorem follows. 

Theorem 3. If there some e > 0 such that a polynomial time algorithm can 
approximate test collection within (1 — e)lnn, then NP C DTIME{n}°^'^°^'^). 
Here n is the number of elements in the ground set. 



5 Test Collection with Small Tests 

In this section, we consider the special case when all the tests are small. We first 
give an algorithm for the case when the size of the largest set is bounded by k. 
The performance guarantee is 0(ln k). We then look at the special case when the 
size of the set is bounded by 2, first verifying the NP-hardness of the problem 
restricted to this special case and then giving a | -I- e-approximation algorithm, 
for any e > 0. 
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Test Collection with Tests of Size at Most k 

We consider the general case where the size of each test is bounded above by a 
constant k. We notice here that using the approximation guarantee of Lemma 0 
and the reduction of the the test collection problem to the set covering problem in 
Lemma^gives a 0{log{{n—k)k)) approximation guarantee for the test collection 
problem when the size of the maximum set is bounded by k. This is due to the 
fact that a test of size k will be reduced to a set of size {n — k)k in the set 
covering problem. In other words, following Definition El such a test set will be 
converted to a cut in the complete graph in the cut cover version where the size 
of a shore is bounded by k, thus having all the induced edges in such a cut, which 
is at most k{n — k). We give a new algorithm that improves this guarantee to 
O(logfc). 

We first prove a relation between the test collection problem and the set 
covering problem on the same identical instance where we think of the test sets 
as covering sets in the set cover instance. This is not a problem reduction as we 
did earlier in Lemma E] 

Lemma 7. If SC{S,C) is an instance of the set covering obtained from 
TC{S,C), an instance of a test collection problem on the same ground set by 
defining the sets in SC as tests inTC, then Opt{SC(S ^C)) < Opt{TC{S + 
where Opt{x) denotes the size of an optimal solution to problem x. 

Proof. By Definition El all the elements of the ground set S must have a different 
incidence pattern in any solution to TC. As a consequence, in a valid solution to 
TC all but at most one elements of the ground set S must have incidence with 
at least one member of any valid solution to TC. This implies that at most one 
of the elements of S is not covered by a valid solution to TC. Thus any solution 
to TC is a set cover for all but at most one member of 5. □ 

We give a new algorithm for the test collection problem that performs par- 
ticularity well when the size of the largest set is small. 

Algorithm 2 SmallTC{S,C) 

Phase I (Cover) Identity map TC{S,C) to a set covering problem SCi{S,C), 
i.e. both the test collection and the set cover instance have the same ground 
set and the tests in TC{S,C) become sets in SC{S,C) (Lemma^. 

Solve SC I using the greedy algorithm for set covering (Lemma^^ and let Cx 
be the solution obtained. Note thatCx is not necessarily a test collection and 
hence is a partial solution to the TC instance. 

Phase II (Reduce and Cover) Reduce the remaining test collection problem 
along with the partial solution Cx to a set covering problem, SCn, using 
the reduction outlined in Lemma^ 

Solve SCr using the greedy algorithm and map the solution to a subset of 
Cr € C. 

Return Cx U Cr . 

Theorem 4. The approximation guarantee of algorithm SmallTC{S,C) is 
O(lnfc), where k is the size of the largest test in C. 
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Proof. In the first phase of the algorithm we identity map the test collection 
problem to a set cover problem where each test in the test collection problem 
becomes a set in the set cover problem. The maximal set size in SCj will hence 
be k and by Lemma|3|we have that \Ci\ < (1 + lnfc) Opt(SCi) and by Lemma0 
we have Opt(SCi) < (Opt{TC) + 1). Thus, \Cj\ = 0{hik)0pt{TC) observing 
that Opt{TC) > |"log 2 |5|] > log 2 k. 

We now apply the reduction given in Lemma0for reducing an instance of the 
test collection problem to an instance of the set covering problem. We recall from 
our previous discussion that the reduction of the test collection problem to the 
set cover problem corresponds to looking at all the elements of the test collection 
problem as nodes and the task is to cover all the edges in the induced complete 
graph using edges of the available cuts. The partial solution has covered a large 
part of the edges and the remaining graph consists of connected components, 
each having size at most k. The size of each set in the set covering reduction 
in Lemma 0 is now at most k{k — 1), since each set has at most k nodes and 
each node has degree at most fc — 1. By Lemma0 we have |C/j| < (1 + In k{k — 
1)) Opt(SCR) < O(lnfc) Opt{TC). 

The result follows by combining the contributions from the two phases. □ 



Test Collection with Tests of Size at Most Two 

Using a reduction from 3-dimensional matching similar to the one in Garey and 
Johnson’s book jOj it is easy to show that the test collection problem remains 
NP-hard even if the sets are limited to have at most three elements. Garey and 
Johnson also state (|E|, P- 222) that when the size of the sets is limited to two the 
problem is solvable in polynomial-time. However, in a last-page update added 
to later reprints, this is corrected and the problem is claimed to be NP-hard. 

Theorem 5. The minimum test collection problem where the size of the largest 
set is limited to two is APX-complete. 

We omit a (routine) proof of the above theorem that uses a reduction from 
maximum bounded 3-dimensonal matching due to lack of space. The above 
theorem implies that there is no polynomial-time approximation scheme for this 
even unless P = NP 0 . 

We will use the notation 2-TG for test collection problems where the size of 
the sets is bounded by two. It is easy to show that we can assume without loss 
of generality that 2-TG contains only tests of size two, we may therefore refer to 
the tests of 2-TG as edges. Next we give an approximation algorithm for 2-TG. 
Before that, we make an observation which changes our perspective of 2-TG and 
motivates the algorithm. 

Lemma 8. 2-TC is equivalent to the following problem: Given a graph, find a 
edge-subset that defines a subgraph with at most one isolated node, and in which 
every other connected component has at least three nodes, and the number of 
components is maximized. 
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Proof. Let TC2 be an instance of 2-TC. In a solution to TC2, at most one node 
has no adjacent edges; Every other node must be adjacent to at least one edge in 
the solution. Furthermore, the solution cannot have single edge components since 
the endpoints of such an edge have the same incidence pattern (Definition |5I) ; 
Therefore each component must have size at least 3. Minimizing the number of 
edges in the graph will be the same as maximizing the number of components 
in the graph induced by the solution since each component will be a tree in the 
optimal (minimal) solution. □ 

By the previous lemma we see that we can use an algorithm for set packing 
with 3-sets as a subroutine in solving 2-TC. In particular, we can use the algo- 
rithm of Hiirkens and Schrijver m which has a performance guarantee of | -I- e 
for any e > 0. 

Algorithm 3 Approx(TC2) 

Reduce Given an instance TC2 of 2-TC construct an instance SP of set pack- 
ing with 3-sets; Let the ground sets be the same; For every triple x,y,z in 
the ground set ofTC2, if at least two of the edges (x,y), (x,z), {y,z) occur 
in TC2, include {x,y,z} in the set packing instance SP. 

Solve Fix an e > 0 and use the algorithm for set packing with 3-sets of m on 
SP with performance guarantee | -k e. 

Construct a solution Ssp. For each 3-set in the solution to the set packing 
problem add a corresponding pair of edges to Sgp. 

Complete the solution by adding edges to Ssp to connect all but an isolated 
node to other larger connected components in the graph to form Stc- 
Return stc- 

Lemma 9. IfTC2 is feasible then Algorithm]^ returns a valid solution to 2-TC 
and runs in polynomial time. 

Proof. The Reduce step can be done in time O(m^), where m is the number of 
edges in TC2. The algorithm of nm runs in polynomial time for any fixed e. The 
construct step can be done in time 0(m). This algorithm guarantees returning 
a maximal solution. If TC2 is feasible then the Complete step can be done 
in time 0{n^) by iteratively looking at all nodes in the graph that are not to 
incident to edges in the current solution. As the number of components in the 
set partitioning solution is maximal all but one isolated node will be connectable 
to the components of the current solution. We can therefore iteratively add one 
node at a time to connect it to the current solution. □ 

Lemma 10. Algorithm\^ gives a g -k e approximation to 2-TC, for any e > 0. 

Proof. We notice that the size of the solution to the test collection problem is 
same as the number of edges in the solution. This is the same as the number of 
nodes in the graph less the number of components in the solution. 

Let SP be the reduced set packing problem. The algorithm for finding a 
set packing with 3-sets nm has a I -k e-approximation guarantee, for any fixed 
e > 0. Exactly one edge will be added for each node not in a 3 set. The size of 
the solution returned by the algorithm will therefore be at most 2Apx{SP) + 
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(n — 3Apx{SP) = n — Apx{S) < n — -^^Opt{SP). Since we can recover a 
set packing solution from a test cover solution in a natural way, Opt{TC) > 
Opt{SP) + (n — 20pt{SP)) = (n — Opt{SP)). The approximation guarantee of 

n— ~A—Opt(SP) 

T is then Apxr < — n-Opt(SP ) — ■ Notice that Opt{SP) < | and we see that the 
performance ratio of Approx{TC2) is at most | + e', for any fixed e' > 0. □ 
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Abstract. The problem of computing tandem repetitions with K pos- 
sible mismatches is studied. Two main definitions are considered, and for 
both of them an 0{nK log K + S) algorithm is proposed {S the size of 
the output). This improves, in particular, the bound obtained in [t/j . 



1 Introduction 

Repetitions (periodicities) play a central role in word combinatorics llbl . On the 
other hand, repetitions are important from the application perspective. As an 
example, their properties allow to speed up pattern matching algorithms 050. 

The problem of efficiently identifying repetitions in a given word is one of 
the classical pattern matching problems IHEl . A tandem repeat or a square is a 
pair of consecutive occurrences of a subword in a word. For example, baba is a 
tandem repeat in word cbacbabacba. Since the beginning of 80s |7|, it is known 
that checking whether a word contains no tandem repeat (or is square-free) can 
be done in time 0(n) (n length of the word). If one wants to find all tandem 
repeats, their number comes into consideration. Word a" contains O(n^) tandem 
repeats. If we restrict ourselves to primitive squares (i.e. subwords uu where u is 
not itself a repetition for k > 2), then a word may contain O(nlogn) of them 
and this bound is tight. All primitive squares can be found in time 0{n -I- S) 
where S is their number |15l2:{li2| . hence in the worst-case time O(nlogn). 

In HM, we studied maximal repetitions (see also [r21)|l fij ). Those can be 
viewed as maximal runs of squares mm, i-e. series of squares of equal length 
shifted by one letter one with respect to another. For example, bcbacacacaab 
contains a maximal repetition acacaca which is a succession of four squares : 
acac, caca, acac, caca. Thus, the set of maximal repetitions can be regarded 
as an encoding of all tandem repeats in the string. We showed ng that this 
encoding is more compact in the worst case, as there are only 0(n) maximal 
repetitions in words of length n. Moreover, all of them can be found in time 
0(n) pg. 

More recently, searching for repetitions in a string received a new motivation, 
due to biosequence analysis m Successive occurrences of a fragment often bear 
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important information in DNA sequences and their presence is characteristic for 
many genomic structures (such as telomer regions for example). From practical 
viewpoint, satellites and alu-repeats are involved in chromosome analysis and 
genotyping, and thus are of great interest to genomic researchers. Tools for find- 
ing successive repeats are nowadays an obligatory part of integrated systems for 
analyzing and annotating whole genomes 12]. 

The major difficulty in finding biologically relevant repetitions in genomic 
sequences is a certain variation that must be admitted between the copies of the 
repeated subword. In other words, biologists are interested in approximate repe- 
titions and not necessarily in exact repetitions only. The first natural definition 
of approximate repetition is an approximate tandem repeat which is a subword 
uv where u and v are within a given distance k and the notion of distance could 
be one of those usually used in biological applications, such as Hamming dis- 
tance or edit distance. The problem of finding approximate tandem repeats for 
both these distances has been studied by G. Landau and J. Schmidt They 
showed that in case of the Hamming distance (respectively edit distance), all 
approximate tandem repeats can be found in time 0{nK\og{n/K) S) (respec- 
tively 0{nK\ogK\ogn S')), where S is the number of repeats found. Several 
other approaches to finding approximate tandem repeats in DNA sequences have 
been proposed in the bioinformatics community - some of them use statistical 
framework m, some require to specify the size of repeated motif I j . some 
use very general framework and have to make use of some heuristic filtering steps 
to avoid exponential blow-up [2S|. 

This paper deals with finding approximate repetitions using exact combi- 
natorial methods of string matching. We focus on the Hamming distance case 
when the variability between repeated copies can be only letter replacements. An 
important motivation is to define structures encoding families of approximate 
tandem repeats, analogous to maximal repetitions in the exact case. In SectionEl 
we define two fundamental structures : globally-defined approximate repetitions 
and runs of approximate tandem repeats. In Sectional we show that all globally- 
defined approximate repetitions can be found in time 0{nK log K -\- S), where 
S is their number. In Section 0] we show that the same bound holds for runs of 
approximate tandem repeats: all of them can be found in time 0{nK log K -\- R), 
where R is their number. This result implies, in particular, that all approximate 
tandem repeats can be found in time 0{nK log K -\-T) (T their number), improv- 
ing the 0{nKlog{n/ K) -\- T) bound of G. Landau and J. Schmidt for the most 
interesting case of small K. Finally, in the last section we give some concluding 
remarks and mention possible extensions of presented results. 

Due to space limitations, we present only a high-level description of the 
algorithms, and we refer to the extended version m for algorithm pseudo-codes. 
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2 1^-Mismatch Globally-Defined Repetitions and Runs of 
1^-Mismatch Tandem Repeats 

Quoting one difficulty in dealing with (approximate) tandem repeats is accu- 
rately defining them. Even if we concentrate only on mismatches, as it is the case 
in this paper, different definitions of approximate repetitions can be thought of. 
Here we introduce two basic notions of approximate repetitions. 

We start by recalling briefly some facts about exact repetitions. The period 
of a word tc)! : n] is the minimal natural number p such that w[i\ = w[i -\- p\ 
for all 1 < i, z + p < n. The ratio n/p is called the exponent of w. A repetition 
is any word with the exponent greater or equal to 2 ESI. A tandem repeat, or a 
square, is a word which is a catenation of another word with itself. Equivalently, 
a tandem repeat is a repetition the exponent of which is an even natural number. 
In the case when the exponent is equal to 2, the tandem repeat (square) is called 
primitive. The following proposition is well-known (see m)- 

Proposition 1. A word r[l : n] is a repetition with period p < n/2 if and only 
if one of the following conditions holds: 

(i) r[l..n — p\ = r[p -\- l..n\, and p is the minimal number with this property, 

(ii) any subword of r of length 2p is a tandem repeat, and p is the minimal 
number with this property. 

When considering repetitions as subwords of a bigger word, the notion of max- 
imality turns out to be very useful: a repetition is maximal iff it cannot be 
extended (by one letter) to the right or left while keeping the same period. For- 
mally, given a word w[l : n] and a subword w[i..j] which is a repetition of period 
p, this repetition is called maximal if the period of both w[i..j 1] (provided 
that j < n) and w[i — l..j] (provided that z > 1) is strictly larger than p. For 
example, acaabaababc contains repetition (tandem repeat) aabaab which is not 
maximal, as the a which follows it respects the periodicity. On the other hand, 
aabaaba occurs as a maximal repetition. Maximal repetitions were studied in 




We now turn to defining approximate repetitions. Similar to the exact case, 
the basic notion here is the approximate tandem repeat. Assume h(-,-) is the 
Hamming distance between two words of equal length, that is h(w\,W 2 ) is the 
number of mismatches (letter differences at corresponding positions) between w\ 
and W 2 . For example, h{baaacb,bcabcb) = 2. 

Definition 1. A word a = a'a" , such that \a'\ = \a"\, is called a AT-mismatch 
tandem repeat iff h(a' , a") < K. Reusing the terminology of the exact case I / ‘At . 
we call number p = \a'\ = \a"\ the period of a, and words a', a" left and right 
root of a respectively. 

We now want to define a more global structure which would be able to 
capture “long approximate periodicities” , generalizing repetitions with arbitrary 
exponent in the exact case. As opposed to the exact case. Conditions (i)-(ii) of 
Proposition^ generalize to different notions of approximate repetition. Condition 
(i) gives rise to the strongest of them: 
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Definition 2. A word r[l : n] is called a A'-mismatch globally-defined repetition 
with period p, p < n/2, ijf h{r[l..n — p],r[p + l..n]) < K. 

Equivalently, r[l : n] is a itT-mismatch globally-defined repetition with pe- 
riod p, if the number of i such that r[i] ^ r[i + p] is at most K. For exam- 
ple, abaa abba cbba cb is a 2-mismatch globally-defined repetition with period 4. 
abc abc abc abb abc abc abc abc is a 1-mismatch globally-defined repetition with pe- 
riod 3 but abc abc abc abb abc abc abc abb is not. 

Another viewpoint, expressed by Condition (ii) of Proposition Q1 considers 
a repetition as an encoding of squares it contains jl ll'zid) . Projecting this to the 
approximate case, we come up with the notion of run of approximate tandem 
repeats'. 

Definition 3. A word r[l : n] is called a run of K -mismatch tandem repeats of 
period p, p < n/2, iff for every i € [l..n — 2p-|- 1], subword a = r[i..i 2p — 1] = 
r[i..i -\- p — l\r[i -\- p..i -I- 2p — 1] is a K -mismatch tandem repeat of period p. 

Similarly to the exact case, when we are looking for approximate repetitions 
occurring in a word, it is natural to consider maximal approximate repetitions. 
These are repetitions extended to the right and left as far as possible provided 
that the corresponding definition is still verified. Note that the notion of maxi- 
mality applies to both definitions of approximate repetition considered above : 
in both cases we can extend a repetition to the right/left as long as the obtained 
subword remains a repetition according to the considered definition. Throughout 
this paper we will be always interested in maximal repetitions, without mention- 
ing it explicitly. Note that for both notions of approximate repetitions defined 
above, the maximality requirement implies that if i(;[i : j] is a repetition of period 
pin'u;[l : n], then w[j-|-l] ^ w[j-\-l—p] (provided j < n) andrc[i— 1] ^ w[i-l-\-p] 
(provided i > 1). Furthermore, if w[i : j] is a maximal globally-defined repeti- 
tion, it contains exactly K mismatches w[l] yf w[l-\-p], i < l,l-\-p < j, unless the 
whole word w contains less than K mismatches (to simplify the presentation, 
we always exclude this latter case from consideration). 

Example 1. The following Fibonacci word contains three runs of 3-mismatch 
tandem repeats of period 6. They are shown in regular font, in positions aligned 
with their occurrences. Two of them are identical, and contain each four 3- 
mismatch globally-defined repetitions, shown in italic for the first run only. The 
third run is a 3-mismatch globally-defined repetition in itself. 

010010 100100 101001 010010 010100 1001 

10010 100100 101001 
10010 100100 10 
0010 100100 101 
10 100100 10100 
0 100100 101001 
1001 010010 010100 1 
10 010100 1001 

In general, each AT-mismatch globally-defined repetition is a subword of a run 
of AT-mismatch tandem repeats. On the other hand, a run of tandem repeats in 
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a word is the union of all globally-defined repetitions it contains. However, a run 
of tandem repeats may contain as many as a linear number of globally-defined 
repetitions. For example, the word (000 lOO)" of length 6n is a run of 1-mismatch 
tandem repeats of period 3, which contains (2n— 1) 1-mismatch globally-defined 
repetitions. In general, the following observation holds. 

Lemma 1. Let ic[l \ n] be a run of K -mismatch tandem repeats of period p and 
let s be the number of mismatches ^ w[i-\-p], 1 < i,i-\-p < n (equivalently, 
s = h{w[l..n — p],w[p l..n])). Then w contains s — K 1 globally- defined 
repetitions. 

Note that both definitions can be criticized as for their relevance to prac- 
tical situations. An obvious property of runs is that the repeated pattern 
can change completely along the run regardless the value of K. For example, 
aaa aba abb abb bbb is a run of 1-mismatch tandem repeats of period 3, although 
3-letter patterns aaa and bbb have nothing in common. On the other hand, 
globally-defined repetitions put a global limit on the number of mismatches and 
therefore may not capture some repetitions that one would possibly like to con- 
sider as such, in particular repetitions with big exponents where the total number 
of mismatches can exceed K while the relative number of mismatches remains 
low. However, these two structures are of primary importance as they provide 
respectively the weakest and strongest notions of repetitions with K mismatches, 
and therefore “embrace” all practically relevant repetitions. In what follows we 
propose efficient algorithms to find both those types of repetitions. 

3 Finding FC-Mismatch Globally-Defined Repetitions 

In this section we describe how to find, in a given word w, all maximal K- 
mismatch globally-defined repetitions occurring in w {K is a given constant). 
Our algorithm extends, on the one hand, the one for exact maximal repetitions 
and on the other hand, generalizes the one of ^3 (see also ^01) by using 
a special factorization of the word to speed-up the algorithm. 

To proceed, we need more definitions. Consider a globally-defined repetition 
r = w[i..j] of period p in a word w[l : n]. w[i..i -I- p — 1] is called the left root 
of r and w[j — p -\- l..j] its right root, r is said to contain the character ui[/] 
iff i < ^ < j, and is said to touch iff r contains w[l], or contains one of 
characters w[l — 1], w[l -\- 1]. 

Our first basic technique is described by the following auxiliary problem: 
Given a word w[l \ n] and a distinguished character ui[/], I G [2..n — 1], we wish 
to find all iF-mismatch globally-defined repetitions in w which touch w[l]. We 
distinguish two disjoint classes of repetitions according to whether their right 
root starts to the left or to the right to w[Z]. We concentrate on repetitions of 
the first class, those of the second class are found similarly. 

For each p S [1../ — 1], and for all k G [0..AT], we compute the following 
functions : 



LPk{p) = max{ j\h{w[l - p.. I -p-G j - l],w[l..l -G j - 1]) < k}, (1) 

LSk{p) = vaeix{j\h{w[l - p - j..l -p - l],w[l - j..l - 1]) < k}. (2) 
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Informally, LPk {p) is the length of the longest subword in w starting at position 
I — p and equal, within k mismatches, to a subword starting at 1. Similarly, 
LSk{p) is the length of the longest subword ending at position I — p — 1 equal, 
within k mismatches, to a subword ending at position / — 1. These functions 
are variants of longest common extension functions jl Yll ll) and can be computed 
in time 0{nK) using suffix trees combined with the lowest common ancestor 
computation in a tree. We refer to PH for a detailed description of the method. 

Consider now a /f-mismatch globally-defined repetition r with a period p 
which has its right root starting to the left of w[^]. Note that character w[l — p] 
is contained in r, and that r is uniquely defined by the number of mismatches 
~ p] contained in r. Let k be the number of those mismatches. 

Then 

LPkip) + LSx-kip) > P- ( 3 ) 

Conversely, o can be used to detect a repetition. The following theorem holds 
(see p7irn| L which is a generalization of the corresponding result of 

Theorem 1. Let w[l : n] he a word and w[l\, 1 < I < n, a distinguished char- 
acter. There exists a K -mismatch globally-defined repetition with period p which 
contains w[l], and has its right period starting to the left to w[l], iff for some 
k e [0..K], 

LPk{p) < P, (4) 

and inequation w holds. In this case, this repetition starts at position I — p — 
LSx-kip) ends at position I -\- LPk{p) — 1. 

Inequation E] ensures that the right root starts to the left of w[l]. 

Theorem E provides an 0{nK) algorithm for finding all considered globally- 
defined repetitions: compute longest extension function (P (0 (this takes time 
0{nK)) and then check inequations 0, 0 for all p € [1..Z— 1] and all k € [0..iL] 
(this takes time 0{nK) too). Each time the inequations are verified, a new 
repetition is identified. Finding repetitions with the right root starting to the 
right of w[l] is a symmetric problem, which is solved within the same time bound. 

The algorithm solving the auxiliary problem described above will be referred 
to as Algorithm 1. 

The second important tool is Lempel-Ziv factorization used in the well-known 
compression method. Let w be a word and assume that the last symbol of w 
does not occur elsewhere. In this paper, we need two variants of the Lempel-Ziv 
factorization, that we call with copy overlap and without copy overlap 

Definition 4. The Lempel-Ziv factorization of w with copy overlap (respectively 
without copy overlap) is the factorization w = / 1/2 . . . fm, where ff ’s are defined 
inductively as follows: 

^ The s-factorization used in rmni is a minor modification of the Lempel-Ziv fac- 
torization with copy overlap. The difference is that the s-factorization considers the 
longest factor occurring earlier, while the Lempel-Ziv factorization considers the 
shortest factor which does not occur earlier (see jinj for a related discussion) . In this 
paper, we use the Lempel-Ziv factorization which suits better to our purposes. 



176 R. Kolpakov and G. Kucherov 



(i) /i = w[l], 

(ii) for i > 2, fi is the shortest word occurring in w immediately after 
fif 2 ■ ■ ■ fi-i which does not occur in fif 2 ---fi other than in prefix (re- 
spectively, does not occur in / 1/2 . . . fi-i)- 

As an example, the Lempel-Ziv factorization with copy overlap of the word 
aabbabababbbc is a\ab\ba\bababb\bc ; the factorization without copy overlap is 
a\ab\ba\bab\abbb\c. Both variants of Lempel-Ziv factorization can be computed 
in linear time E2Cn|. If w = / 1/2 . . . fm is the Lempel-Ziv factorization, we call 
ffs Lempel-Ziv factors or simply factors of w. The last character of fi will be 
called the head of fi. 

We are now ready to describe the algorithm for finding all AT-mismatch 
globally-defined repetitions. Consider the Lempel-Ziv factorization of w with 
copy overlap. The algorithm consists of three stages. The key to the first stage 
is the following lemma. 

Lemma 2. The right root of a K -mismatch globally -defined repetition cannot 
contain as suhword K -\- 1 consecutive Lempel-Ziv factors. 

Proof. Each factor contained in the right root contains a character mismatching 
the one located one period to the left. Indeed, if it does not contain a mismatch, 
it has an exact copy occurring earlier, which contradicts the definition of fac- 
torization. As the right root contains at most K mismatches, it cannot contain 
AT -|- 1 or more factors. 

We divide w into consecutive blocks of K -\- 2 Lempel-Ziv factors. Let w = 
Bi . . . Bjyi' be the partition of w into such blocks. The last character of Bi will 
be called the head character of this block. At the first stage, we find, for each 
block Bi, those repetitions which touch the head character of Bi but do not 
touch that of First, concentrate on those of such repetitions with the right 
root starting before the head character of Bi. 

Lemma 3. Assume a K -mismatch globally-defined repetition r touches the head 
character of Bi but not that of Bij^i. Then |r| < 2\BiBi^i\. 

Proof. Lemma 0 implies that the right root of r cannot start before the first 
character of Bi. Therefore, the period of r is bounded by \BiBi+i\. On the other 
hand, by the argument of the proof of Lemma |21 r cannot extend by more than a 
period to the left of Bi. Therefore, the total length of r is bounded by 2\BiBi+i\. 
Lemma 0 allows us to apply Algorithm 1: Consider the word Wi = vBiBi+i, 
where v is the suffix of Bi . . . Bi-i of length \BiBi+i\. Then find, using 
Algorithm 1, all repetitions in Wi touching the head character of Bi and dis- 
card those which touch the head character of The resulting complexity is 

0(iL(|S,| + |i?,+i|)). 

After processing all blocks, we find all repetitions touching block head char- 
acters. Observe that repetitions resulting from processing different blocks are 
distinct. Summing up over all blocks, the resulting complexity of the first stage 
is 0{nK). The repetitions which remain to be found are those which lie entirely 
within a block - this is done at the next two stages. 

At the second stage we find all repetitions inside each block Bi which touch 
factor heads other than the block head (=last character of the block). For each 
Bi, we proceed by simple binary division approach: 
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(i) divide current block of factors B = fifi+i . . . fi+m into two sub-blocks 
B' = fi ■ ■ . /[m/2j and B” = /[m/2j+l ■ • • fi+m, 

(ii) using Algorithm 1, find the repetitions in B which touch the head character 
of /[m/ 2 jj but discard those which touch the head character of fi+m or 
contain the first character of fi, 

(iii) process recursively B' and B” . 

The above algorithm has [log K~\ levels of recursion, and since at each step the 
word is split into disjoint sub-blocks, the whole complexity of the second stage 
is 0{nK log K). 

Finally, at the third stage, it remains to find the repetitions which occur 
entirely inside each Lempel-Ziv factor, namely which don’t contain its first char- 
acter and don’t touch its head character. By definition of factorization with copy 
overlap (Definition EJ , each factor without its head character has another (pos- 
sibly overlapping) occurrence to the left. Therefore, each of these repetitions has 
another occurrence to the left too. Using this observation, these repetitions can 
be found using the same technique as the one of m- When constructing the 
Lempel-Ziv factorization we keep for each factor wa a pointer to a copy of w 
to the left. Then processing factors from left to right, recover repetitions inside 
the factor from its pointed copy. We refer to m for algorithmic details. The 
complexity of this stage is 0{n + S), where S is the number of repetitions found. 
The following theorem summarizes this section. 

Theorem 2. All K -mismatch globally-defined repetitions can be found in time 
0(nK log K -b S) where n is the word length and S is the number of repetitions 
found. 



4 Finding Runs of R-Mismatch Tandem Repeats 

In this section we describe an algorithm for finding all runs of AT-mismatch 
tandem repeats in a word. 

The general structure of the algorithm is the same as for globally-defined 
repetitions - it has the three stages playing similar roles. At the first and second 
stages, the key difference is the type of objects we are looking for: instead of 
computing globally-defined repetitions we now compute subruns of AT-mismatch 
tandem repeats. Formally, a subrun is a run of AT-mismatch tandem repeats, 
which is not necessarily maximal. At each point of the first and second stage when 
we search for repetitions touching some head character w[l], we now compute 
subruns of those AT-mismatch tandem repeats which touch w[Z]. This can be 
seen as outputting by Algorithm 1 only the part of the globally-defined repetition 
falling to the interval I — 2p..l -\- 2p. 

The major additional difficulty of computing runs is assembling subruns into 
runs. To perform the assembling, we need to store subruns in an additional data 
structure and to carefully manage merging of subruns into bigger runs. We have 
to ensure that the number of subruns we come up with and the work spent 
on processing them do not increase the resulting complexity bound. Below we 
describe the three stages of the algorithm in more details. 
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We identify a subrun with the interval of end positions of the tandem repeats 
it contains. 

For the input word w, we compute the Lempel-Ziv factorization without copy 
overlap and divide it into blocks B\ . . . Bm', each containing K + 2 consecutive 
Lempel-Ziv factors. Note that LemmaQ still holds for the factorization without 
copy overlap. At the first stage, we find subruns of all those tandem repeats 
which touch block head characters. For each block Bi, we find the tandem repeats 
which touch the head character of B^ but not that of Let li be the position 
of the head character of Bi. Then the subruns of period p, found at this step, 
belong to the interval [k — l..min{li -|- 2p, — 2}] . We call this interval the 

explored interval for w[li] and p. The subruns found at this step can be seen as 
subintervals of this explored interval. These subruns are stored into a double- 
linked list in increasing order of positions, position li — 2, it is merged with the 
explored interval for w[li], thus forming a bigger explored interval. Accordingly, 
the lists of subruns associated with these intervals are merged into a single list. 
All additional operations take constant time, and the resulting complexity of the 
first stage is 0{nK). 

The second stage is modified in a similar way. Recall that at each call of 
modified Algorithm 1 we are searching for repetitions occurring between some 
factor head, say w[l'], another factor head w[l”], and touching some factor head 
w[l] {V <l < I”). Assuming that recursive calls are executed in preorder (see the 
description of the second stage in the previous section), no factor head between 
w[l'] and w\l”] has been processed yet. In this case, the explored interval is 
[max{r + 2p + 1,1 — l}..min{Z -|- 2p,l" — 2}], and we may have to merge it 
either with the previous explored interval, or with the next one, or both. The 
complexity of the second stage stays 0{nK log K). 

After accomplishing the first and second stages, we have, for each period p, 
a set of non-intersecting explored intervals. Each interval is associated with a 
sequence of successive head characters w[li],w[li+i], . . . ,w[lm] such that — 
Ij < 2p+2 for j G [i..m — l], and the interval itself is [li — l..lm + ‘^p]- In particular, 
the interval is associated to w[li] and w[lm] - the first and the last head characters 
of this sequence. Those subruns of tandem repeats which have been actually 
found within this interval, are stored in a double-linked list associated to the 
interval. 

At the third stage, we have to find subruns of those tandem repeats which 
lie entirely inside Lempel-Ziv factors. For each period, potential occurrences of 
these subruns correspond precisely to the gaps between explored intervals. Thus, 
the third stage can be also seen as closing up the gaps between explored intervals 
for this period. 

As in the previous section, the key observation here is the fact that Lempel- 
Ziv factors without their head character have a copy to the left (here required 
to be non-overlapping), and the idea is again to process w from left to right and 
to retrieve the subruns inside each factor from its copy. However, the situation 
here is different in comparison to globally-defined repetitions: we may have to 
“cut out” a chain of subruns belonging to the factor copy from a longer list 
and then to “fit” it into the gap between two explored intervals. The “cutting 
out” may entail splitting subruns which span over the borders of the factor 
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copy, and “fitting into” may entail merging those subruns with subruns from 
the neighboring explored intervals. Below we sketch the algorithm for the third 
stage, which copes with these difficulties. 

During the computation of the Lempel-Ziv factorization, for each Lempel- 
Ziv factor = va we choose a copy of v occurring earlier and point from the 
end position of this copy to the head character a oi fi. It may happen that one 
position has to have several pointers, in which case we organize them in a list. 
We traverse w from left to right and maintain for the current position the last 
runs (of all possible periods) which start before this character. To this purpose, 
we also maintain the following invariant: at the moment we arrive at a position, 
we know the list of all subruns which start at this position. This information is 
collected according to the following general rule: for each subrun starting at the 
current position, we assign the starting position of the next subrun in the list. 
Of course, there may be no next subrun if the current subrun is the last one in 
the explored interval. In this case, the starting position of the subrun following 
the current subrun will be set at the moment we fill the gap after this explored 
interval. 

When we arrive at the end position of a copy of a Lempel-Ziv factor, we need 
to copy into the factor all the subruns which this copy contains. Therefore, we 
scan backwards the subruns contained in the copy and copy them to the factor. 
Copying the subruns closes up two explored intervals into one interval, and links 
together two lists of subruns, possibly inserting a new list of runs in between. 
Copying subruns in the backward direction is important for the correction of the 
algorithm - this guarantees that no subruns are missed. It is also for this reason 
that we need the copy to be non-overlapping with the factor. 

After the whole word has been traversed, no more gaps between explored 
intervals exist anymore. This means that for each period, we have a list of subruns 
with this period occurring in the word, which are actually the searched runs. The 
complexity of the third stage is 0(n -|- 5'), where S is the number of resulting 
runs. Putting together the three stages, we obtain the main result of this section. 

Theorem 3. All runs of K -mismatch tandem repeats ean be found in time 
0{nK log K S) where n is the word length and S is the number of runs found. 

Once all runs have been found, we can easily output all tandem repeats. We 
then have the following result improving the result of ini- 

Corollary 1. All K -mismatch tandem repeats ean be found in time 
0(nK log K -|- S) where n is the word length and S is the number of tandem 
repeats found. 

5 Concluding Remarks 

We proposed 0{nK log K -\- S) algorithms for finding iL-mismatch globally- 
defined repetitions and runs of AT-mismatch tandem repeats {S the output size). 
Note that if K is considered constant, we have 0(n -I- S) algorithms for finding 
each of these structures. This is an interesting result, which had been long time 
unknown even for the exact case 
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Globally-defined repetitions and runs of tandem repeats provide respectively 
the strongest and the weekest notion of approximate repetitions. For practical 
applications, such as genome analysis, it might be interesting to consider inter- 
mediate definitions with respect to the two “extreme” cases. In the extended 
version d, we introduced two such types of approximate repetitions, so-called 
K -mismatch uniform repetitions and K -mismatch consensus repetitions, and an- 
alyzed their relationship to the notions of repetition considered in this paper. 
However, designing an efficient algorithm for finding repetitions of those types 
remains an open problem. 

In the final stage of preparation of this paper, we got known of paper HH|. 
In this paper, yet another definition of approximate repetitions is considered, 
which is weaker than globally-defined repetitions, but stronger than both uniform 
repetitions and consensus repetitions from m- The algorithm presented in HBj 
runs in time 0{nKElog(ji/K)), where E is the maximal exponent of reported 
repetitions. 

The algorithms presented in this paper are now being implemented within 
the mreps softwar^. Currently, mreps implements the algorithm of finding ex- 
act maximal repetitions d- Some interesting experiments have been done by 
applying mreps to genomic sequences |H]. 

Acknowledgments. We thank Mathieu Giraud, with whom we had first dis- 
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Abstract. Single nucleotide polymorphisms (SNPs) are the most fre- 
quent form of human genetic variation. They are of fundamental impor- 
tance for a variety of applications including medical diagnostic and drug 
design. They also provide the highest-resolution genomic fingerprint for 
tracking disease genes. This paper is devoted to algorithmic problems 
related to computational SNPs validation based on genome assembly of 
diploid organisms. In diploid genomes, there are two copies of each chro- 
mosome. A description of the SNPs sequence information from one of the 
two chromosomes is called SNPs haplotype. The basic problem addressed 
here is the Haplotyping, i.e., given a set of SNPs prospects inferred from 
the assembly alignment of a genomic region of a chromosome, find the 
maximally consistent pair of SNPs haplotypes by removing data “errors” 
related to DNA sequencing errors, repeats, and paralogous recruitment. 
In this paper, we introduce several versions of the problem from a com- 
putational point of view. We show that the general SNPs Haplotyping 
Problem is NP-hard for mate-pairs assembly data, and design polyno- 
mial time algorithms for fragment assembly data. We give a network-flow 
based polynomial algorithm for the Minimum Fragment Removal Prob- 
lem, and we show that the Minimum SNPs Removal problem amounts 
to finding the largest independent set in a weakly triangulated graph. 



1 Introduction 

1.1 Motivation 

The large-scale laboratory discovery and typing of genomic sequence variation 
presents considerable challenges and it is not certain that the present technolo- 
gies are sufficiently sensitive and scalable for the task. Computational methods 
that are intertwined with the experimental technologies are emerging, leading 
the way for this discovery process. 

Single nucleotide polymorphisms (SNPs) are the most frequent form of hu- 
man genetic variation and provide the highest-resolution genomic fingerprint for 
tracking disease genes. The SNPs discovery process has several stages: sample 
collection, DNA purification, amplification of loci, sequence analysis, and data 
management. The “large-scale” dimension of the analysis refers to either the 
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large number of loci or of individuals. Effective integration of these stages is 
important for the strategies employed in the pipeline. The choice of method, for 
any stage, could be influenced by the processes used in other stages. 

The SNPs discovery involves the analysis of sequence differences and haplo- 
type separation from several samples of a genomic locus from a given population. 
The leading two methods are based on Shotgun Genome Assembly (SGA) and on 
PGR amplification (PGR). The methods have complementary strengths and rec- 
ognized computational bottlenecks associated with them. The SGA, in principle, 
generates haploid genotyping, and does not require sequence information for the 
loci, however, it needs good library coverage, and is computationally very chal- 
lenging to distinguish paralogous repeats from polymorphism. The PGR method 
requires the knowledge of the genomic region of the locus, and could be done 
very effectively; however, it is expensive for large-scale projects. 

There is need for powerful computational approaches that are sensitive 
enough and scalable so that they can remove noisy data and provide effective 
algorithmic strategies for these technologies. This paper is a first step towards 
such “computational SNPology”. It is devoted to algorithmic problems related 
to computational SNPs discovery and validation based on genome assembly. The 
basic problem is to start from a set of SNPs prospects inferred from the assembly 
alignment and to And out the maximal consistent subset of SNPs by removing 
’’errors” related to sequencing errors, repeats, and paralogous recruitment. 



1.2 Preliminaries 

Recent whole-genome sequencing efforts have confirmed that the genetical 
makeup of humans is remarkably well-conserved, and small regions of differ- 
ences are responsible for our diversities. The smallest possible region consists 
of a single nucleotide, and is called Single Nucleotide Polymorphism, or SNP 
(“snip”). This is a position in our genome at which we can have one of two 
possible values (alleles), while in the neighborhood of this position we all have 
identical DNA content. Since our DNA is organized in pairs of chromosomes, for 
each SNP we can either be homozygous (same allele on both chromosomes) or 
heterozygous (different alleles) . Independently of what the actual different alleles 
at a SNP are, in the sequel we will denote the two values that each SNP can 
take by the letters A and B. A chromosome content projected on a set of SNPs 
(or haplotype), is then simply a string over the alphabet {A,B}, while a genotype 
is a pair of such strings, one for each haplotype. 

DNA sequencing techniques are restricted to small, overlapping fragments. 
Such fragments can contain errors (e.g., due to low quality reads), and can 
come from either one of the two chromosome copies. Further, e.g. in shotgun 
sequencing, some pairs of these fragments {mate pairs) are known to come from 
the same copy of a chromosome and to have a given distance between them. The 
basic problem is then the following: “Given a set of fragments obtained by DNA 
sequencing from the two copies of a chromosome, reconstruct the two haplotypes 
from the SNPs values observed in the fragments.” 
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Fig. 1. (a) M. (b) Gt{M). (c) Gs(M) (d) An odd cycle of fragments. 



Note that it is possible, even in an error-free scenario, that the above problem 
cannot be solved because the information is insufficient. For instance, if a set of 
fragments does not share any SNPs with any of the remaining fragments, we may 
only be able to reconstruct partial haplotypes, but then we wouldn’t know how 
to merge them into only two (a problem known as phasing). It is better then to 
relax the requirement from “reconstruct the two haplotypes” to “reconstruct two 
haplotypes that would be compatible with all the fragments observed”. Stated 
in this form, the problem becomes trivial for the error-free case (as we will see in 
the sequel, it is simply the problem of determining the two shores of a bipartite 
graph). However, experiments in molecular biology are never error-free, and, 
under a general parsimony principle, we are led to reformulate the problem as 
“Find the smallest number of errors in the data so that there exist two haplotypes 
compatible with all the (corrected) fragments observed.” 

Depending on the errors considered, we will define several different com- 
binatorial problems. “Bad” fragments can be due either to contaminants (i.e. 
DNA coming from a different organism than the actual target) or to read er- 
rors (i.e. a false A, a false B, or a - inside a fragment, which represents a SNP 
whose value was not determined). A dual point of view assigns the errors to the 
SNPs, i.e. a “bad” SNP is a SNP for which some fragments contain read errors. 
Correspondingly, we may define the following optimization problems: “Find the 
minimum number of fragments to ignore ”, or “Find the minimum number of 
SNPs to ignore’) so that “the (corrected) data is consistent with the existence of 
two haplotypes measured by error-free DNA sequencing. Find such haplotypes. ” 



1.3 Notation 

The basic framework for SNPs problems is as follows. There is a set S = 
{1, . . . , n} of snips and a set .7^ = {!,..., m} of fragments. Each snip is cov- 
ered by some of the fragments, and can take the values A or B. Hence, a snip 
i is defined by a pair of disjoint subsets of fragments, Ai and Bi. There is a 
natural (canonical) ordering of the snips, given by their physical location on the 
chromosome, from left to right. Then, the data can also be thought of as an 
TO X n matrix over the alphabet {A, B, — }, which we call the SNP matrix, defined 
in the obvious way. 
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The Fragment— and Snip— conflict graphs. We say that two fragments 
i and j are in conflict if there exists a snip k such that i £ Ak, j £ Bk or 
* S j £ Afc. Given a SNP matrix M, the fragment conflict graph is the 

graph Gj^{M) = (iF, Ej^) with an edge for each pair of fragments in conflict. 
Note that if M is error-free, is a bipartite graph, since each haplotype 

defines a shore of Gj^{M), made of all the fragments coming from that haplotype. 
Conversely, if Gj^{M) is bipartite, with shores Hi and H2, all the fragments in 
Hi can be merged into one haplotype and similarly for i? 2 . We call a SNP 
matrix M feasible (and infeasible otherwise) if Gjr(^M) is bipartite, and we call 
the haplotypes obtained by merging the fragments on each shore derived from 
M. For K a set of rows (fragments), we denote by M\K] the submatrix of M 
containing only the rows in K . The fundamental underlying problem in SNPs 
haplotyping is determining an optimal set of changes to M (e.g., row- and/or 
column- deletion) so that M becomes feasible. We remark that Gjr(M) is the 
union of n complete bipartite graphs, one for each column j of M, with shores 
Aj and Bj. 

We now turn to snip conflicts. We say that two snips i and j are in conflict if 
Ai, i?i, Aj, Bj are all nonempty and there exist two fragments u and v such that 
the submatrix defined by rows u and v and columns i and j has three symbols 
of one type (A or B) and one of the opposite (B or A respectively). Given a SNP 
matrix M, the snip conflict graph is the graph Gs{M) — (S,Es), with an edge 
for each pair of snips in conflict. In section we will state the fundamental 
theorem relating the two conflict graphs. 

In this paper we are going to define the following optimization problems: 

— MFR {Minimum Fragment Removal): Given a SNP matrix, remove the min- 
imum number of fragments (rows) so that the resulting matrix is feasible. 

— MSR {Minimum Snip Removal): Given a SNP matrix, remove the minimum 
number of snips (columns) so that the resulting matrix is feasible. 

— LHR {Longest Haplotype Reconstruction): Given a SNP matrix, remove a set 
of fragments (rows) so that the resulting matrix is feasible, and the sum of 
lengths of the derived haplotypes is maximized. 

A gapless fragment is one covering a set of consecuitive SNPs. We say that 
a fragment has k gaps if it covers fc -I- 1 blocks of consecutive SNPs. Such a 
fragment is equivalent to fc -I- 1 gapless fragments with the constraint that they 
must all be put in the same haplotype or all discarded. Particularly important is 
the case k = 1, which is equivalent to 2 gapless fragments coming from the same 
chromosome. This is the case of mate pairs, used for shotgun sequencing |Z]. In 
the remainder of the paper we show that the above problems are NP-hard in 
general. Furthermore, we show that MFR is NP-hard if even a single gap per 
fragment is allowed and MSR is NP-hard for fragments with two gaps. On the 
positive side, we study the special case of gapless fragments, and show that in this 
case the problems can be solved effectively. We provide polynomial algorithms 
for MFR, MSR and LHR. Note that the gapless case arises often in practical 
applications. For space limitations, some of the proofs are omitted in the sequel. 
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2 Getting Started: The Gapless Case 

The simplest scenario for the SNPs haplotype reconstruction problem is when 
the fragments are consecutive, gapless genomic regions. This is not an unreal- 
istic situation, since it arises, for example, in EST (Expressed Sequence Tags) 
mapping. This is in fact the case without mate pairs and with no missed SNPs 
inside a fragment because of thresholding or base-skipping read errors. 

2.1 The Minimum Fragment Removal 

In this section we show that in the gapless case, the minimum fragment removal 
(MFR) problem can be solved in polynomial time. For this section, we assume 
that there are no fragment inclusions, i.e., denoting by fi and h the first and 
last snip of a fragment i, fi < fj implies U < Ij. We define a directed graph 
D = {T,A) as follows. Given two fragments i and j, with fi < fj, there is an 
arc (i,j) G A if i and j can be aligned without any mismatch, i.e., they agree in 
all their common snips (possibly none) . Note that the common snips are a suffix 
of i and a prefix of j . 

Lemma 1. Let M he a SNP matrix, and Pi, P 2 be node-disjoint directed paths 
in D such that |P(Pi)| -I- |P(P 2 )| is maximum. Let R = T — (V{Pi) U V{P 2 )). 
Then R is a minimum set of fragments to remove such that is feasible. 

Theorem 1. There is a polynomial time algorithm for finding Pi and P 2 in D 
such that |P(Ti)| -I- |P(p 2 )| is maximum. 

Proof. We will use a reduction to a maximum cost flow problem. We turn D 
into a network as follows. First, we introduce a dummy source s, a dummy sink 
t, and an arc (t, s) of capacity 2 and cost 0. s is connected to each node i with an 
arc (s, i) of cost 0, and each node i gets connected to t, at cost 0 and capacity 1. 
Then, we replace each node i G D, with two nodes i' and i” connected by an arc 
{i' , i") of cost 1 and capacity 1. All original arcs (u, v) of D are then replaced by 
arcs of type {u" , v') . A maximum cost circulation can be computed in polynomial 
time, by, e.g.. Linear Programming. Since D is acyclic, the solution is one cycle, 
which uses the arc ft, s) and then splits into two s-t dipaths, saturating as many 
arcs of type {i',i") as possible, i.e. going through as many nodes as possible of 
D. Since the capacity of arcs {i',i") is 1, the paths are node-disjoint. 

With a similar reduction, we can show that the problem LHP can also be 
solved in polynomial time. The problem consists in finding two haplotypes of 
maximum total length (where the length of an haplotype is the number of SNPs 
it covers). We use a similar reduction as before, with the same capacities, but 
different costs for the arcs. Now the arcs of type fi' ,i”) have cost 0, while an 
arc fi",j') has cost equal to the number of SNPs in j that are not also in i 

(e.g., an arc (—ABB, BBABA) has cost 3). Arcs {s,i') have cost equal to the 

number of SNPs in i. An s-t unit flow in this network describes a path that goes 
through some fragments, such that the total length of the SNPs spanned (i.e. of 
the haplotype) is equal to the cost of the path. Hence, the max cost circulation 
individues two haplotypes of max total length. This proves the following 
Theorem 2. The LHP for gapless fragments is polynomial. 
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2.2 The Minimum Snip Removal 

Theorem 3. Let M be a gapless SNP matrix. Then Gjr{^M) is a bipartite graph 
if and only if Gs{M) is an independent set. 

Proof. (If) Consider a cycle of fragments in a SNP matrix (see Fig.Q]i). Wlog, 
assume the cycle involves fragments 0 , 1 ,..., /c. For each pair of consecutive 
fragments i, i + 1 (mod (fc + 1 )) there is a position Ui at which one has an A 
and the other a B. We associate to a fragment cycle a directed cycle between 
entries in the matrix, made of horizontal arcs from Ui-\ to Ui in fragment i, and 
vertical arcs from Ui in fragment i to Ui in fragment i + 1 . We call a vertieal line 
a maximal run of vertical arcs in such a cycle. In a vertical line, the letters A 
and B alternate. By definition, an infeasible SNP matrix contains an odd cycle of 
fragments. Let us call weight of an infeasible SNP matrix the minimum number 
of vertical lines of any odd cycles of fragments it contains. 

Assume there exists an infeasible gapless SNP matrix M such that Gs{M) 
is an independent set, and pick M to have minimum weight among all such M . 
Consider an odd cycle in M achieving this weight. Since an infeasible matrix 
cannot have weight 1 there are at least two vertical lines, and hence a “right- 
most” vertical line, say between fragments / and g. Since the line is rightmost, 
Uf-i,Ug < Uf. Assume Ug > rt/-i (same argument if u/_i > Ug). Since M 
is gapless, there exists a symbol at row /, column Ug. The symbols and 

Mg,uf are the same if and only if Mf^ug and Mg^^g are the same (otherwise, the 
rows / and g individue a snip conflict of columns Ug, Uf). Now, consider the 
SNP matrix M' obtained by M by first deleting (i.e., replacing with gaps) all 
the symbols between rows / and g (excluded), and then inserting an alternating 
chain of As and Bs, starting with between rows / and g in column Ug. 

M' is an infeasible gapless SNP matrix of weight at least one smaller than M. 
Further, there are no snip conflicts in M' that were not already in M, so Gs{M') 
is an independent set. Hence, M was not minimal. 

(Only if) We omit the simple proof. 

Note that, in the presence of gaps, only the “only if” part of the theorem 
holds. We now show that the Minimum SNP removal problem can be solved in 
polynomial time on gapless SNP matrices. In particular, we prove that Gs{M) 
is a perfect graph. The basic tool to do this is the following: if I are the nodes 
of a hole or a antihole in Gs{M), and {i,j} is a conflict, with i,j G I, then for 
any k G I such that column k is between columns i and j in M, some relations 
of k with either i or j are forced. This will allow us to forbid long holes and 
antiholes. 

Lemma 2. Let M be a gapless SNP matrix and c\, C2, C3 be snips (columns of 
M) with Cl < C2 < C3. //{ci,C3} is a snip conflict, then at least one o/{ci,C2} 
and {02,03} is also a snip conflict. 

Proof. There are two rows r\ and 02 such that the 2 x 2 submatrix induced 
by rows oi , 02 and columns Ci , 03 has three symbols of one type and one of the 
opposite. In this submatrix call a 2 x 1 column of identical symbols type I and 
one of different symbols type D. Since M is gapless, 02 must be either I or D. 
But one of Ci and 03 is I and the other D. 
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A A A A 

A A B B 



Fig. 2. (a) A cycle without long jumps, (b) two jumps in a row. (c) a jump and a 
shift, (d) The only possible holes. 



Lemma 3. Let M be a gapless SNP matrix and c\, C2, C3, C4 be snips with 
Cl < C2 < C3 < C4. Assume {01,04} is a snip eonflict. Then, if {02,03} is not a 
eonflict, one of Oi and 04 eonflicts with both 02 and 03. ff {02,03} is a eonflict, 
then the conflicts in {ci, C2, C3, C4} contain a length-^} cycle. 

Lemma 4. If M is a gapless SNP matrix, Gs(M) does not have a chordless 
cycle of length > 4. 

Proof. Assume C = {ii,. . . ,ik, ik+i = *i) is a chordless cycle, fc > 4, in Gs{M) 
(i.e. the columns of M, listed in the order defined by the cycle). For t = 1 , . . . ,k, 
let B{t) be the set of nodes in G which lie between it and it+i as columns of 
M. We say that t hits it and it+i. We call t a long jump if \B{t)\ > 1, a jump if 
\B{t)\ = 1 and a shift if \B{t)\ = 0. For a jump, we denote by bt the node such 
that B{t) = {bt}. The following facts are true: {i) If G has a long jump, it must 
be A: = 4. {ii) If C has no long jump, each jump t must be followed by a shift, 
pointing to the node bt. 

To prove (i), consider a long jump, and let B{t) = {oi, 02, . . .}. By lemmaOl 
if {01,02} is not a conflict, that either it or it+i would have degree > 3 in C, 
impossible. Hence it must be a conflict. So, by lemma 0 1*1 , Oi, 02, ifc} must 
contain a cycle, and hence fc = 4, since G cannot contain other cycles. 

To prove (ii), note that if t is a shift for t = 1 , . . . , fc — 1, then fc must be a 
long jump. Hence, if there are no long jumps, there must be jumps (see Fig. ^ 
for an example of a generic such cycle). Now, assume that a jump t is followed 
by another jump. Then (see Fig. I^^)? since neither bt nor bt+i can be hit by a 
long jump, there must be a jump hitting bt and &t+i. But, then lemma0applies 
to bt,it+i,bt+i, so that it+i would have degree 3 in G. Hence, the jump t is 
followed by a shift. If the shift points away from bt, then bt should be hit by a 
long jump (see Fig.Et), impossible. But then, the only possible hole is C4. 

Note that G4 is actually achieved by some matrices (see Fig.Eti). 

The following lemma generalizes lemma 0 
Lemma 5. Let M be a gapless SNP matrix and ci, 02 be snips, with Ci < 02 
and |ci,C2} a snip conflict. Let (oi, 02, . . . , at) be a path in Gg{M), such that 
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Cl < Oi < C2 for alii = 1 , . . . , If {ci, ai} is not a snip conflict, then {at, C2} is 
a snip conflict. //{c2,ai} is not a snip conflict, then {at,ci} is a snip conflict. 

Proof. Since M is gapless, in the 2 x n submatrix of M for which (wlog) ci is 
type I and C2 is type D, m,. . . ,at must be I or D. We prove the first part of the 
lemma, the other being the same argument. Let oq := Ci. Then is type I for 
all z = 1 , . . . , t to avoid a conflict of ai_i and at. This forces the conflict {at, C2}. 

Lemma 6 . If M is a gapless SNP matrix, Gg{M) does not have a chordless 
cycle of length > 4 . 

Proof. Let us call a path in Gg{M) an antipath and a cycle in Gg{M) an anticy- 
cle. Assume {i\,i2, . . ■ ,ik, ik+i = 1 ) is a chordless anticycle of length fc > 5 . Let 
ix = min{zi, . . . , ik{ and iy = max{zi, . . . , ik{. {ix, iy} cannot be a snip conflict, 
or otherwise the part of the anticycle from ix to either iy-i or iy+i, can be used in 
lemmainito derive a contradiction. So, after possibly changing origin and orienta- 
tion of the antihole, we can assume that a; = 1 and y = 2 . We will argue that the 
only possible ordering of these columns in M is zi < 13 < *5 < . . . < *5 < *4 < *2- 
In fact, we can apply the same argument from left to right and from right to 
left. Assume 13 is not successive to i\, but ip, p > 3 , is. Then, the antipath 
(*2) *3 j ■ • ■ j ip-i) would be contained within ip and Z2, and, by lemmaEl {ip-i,ip} 
would be a conflict. Similarly, now assume that 14 is not the second-to-last, but 
some ip, p > 4 : is. Then the antipath {i^, . . . ,ip_i) is contained within i^ and 
ip and, by lemma 0 {ip-i,ip} would be a conflict. We can continue this way 
to prove the only ordering possible, but the important part is that the order- 
ing looks like zi < . . . < 14 < 12. Then, the antipath (ii,ik,ik-i, ■ ■ ■ ,is) is all 
contained within ii and 14. By lemmaEl {is, *4} must be a conflict, contradiction. 

Theorem 4 . If M is a gapless SNP matrix, then Gs(M) is a perfect graph. 

Proof. Because of lemma El and lemma El Gs{M) is weakly triangulated, i.e. 
neither Gs{M) or its complement have a chordless cycle of length > 4 . Since 
weakly triangulated graphs are perfect (Hayward, EDj the result follows. 

The next corollary follows from Theorem El Theorem El and the fact that the 
max independent set can be found in polynomial time on perfect graphs ( I!fl 4 l ) . 

Corollary 1 . The Minimum SNP Removal problem can be solved in polynomial 
time on gapless SNP matrices. 



3 Dealing with Gaps 

If gaps in the fragments are allowed, SNP problems become considerably more 
complex. Typically, a gap corresponds to a SNP whose value at a fragment which 
in fact covers it was not determined (e.g., because of thresholding of low quality 
reads, or for sequencing errors which missed some bases). Also, an important case 
of gaps occurs when fragments are paired up in the so called mate pairs. These, 
used in shotgun sequencing, are fragments taken from the same chromosome. 
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with some fixed distance between them. A mate pair can be thought of as a 
single fragment, with a large gap in the middle and SNPs reads at both ends. 

A class of inputs in which the SNP matrix has gaps but that can still be 
solved in polynomial time is the following. A 0-1 matrix is said to have the 
consecutive ones property (CIP) if the columns can be rearranged so that in 
each row the Is appear consecutively. Analogously, we say that a SNP matrix is 
CIP if there exists a permutation tt of the SNPs such that each fragment covers 
a consecutive (in tt) subset of SNPs. Since finding such tt is polynomial (P ), it 
follows from Theorem and Corollary n] the 

Corollary 2. The problems MFR, LSH and MSR can be solved in polynomial 
time on SNP matrices that are CIP. 

For those matrices that are not CIP, it is easy to show NP-hardness for many 
problems, by using the following lemma, that shows how to code a graph into a 
SNP matrix. 

Lemma 7. Let G = (V, E) be a graph. Then there exists a SNP matrix M such 
that Gj^{M) = G. 

Proof. G can be made into a \V\ x \E\ SNP matrix by having a fragment for 
each node in V and a SNP for each edge e = {i,j}, with value A in z and B in j. 

We can now give a simple proof that MFR is NP-hard. 

Theorem 5. MFR is NP-hard. 

Proof. We use a reduction from the following NP-hard problem ftiiS] : Given a 
graph G = {V, E), remove the fewest number of nodes to make it bipartite. This 
is exactly MFR when G is encoded into a SNP matrix as described in lemma 0 

In the following theorem, we show that it is the very presence of gaps in 
fragments, and not their quantity, that makes the problem difficult. 

Theorem 6. MFR is NP-hard for SNP matrices in which each fragment has at 
most 1 gap. 

The proof is through a polynomial time reduction from Max2SAT 0. Con- 
sider an instance d>{k, n) of Max2SAT with k clauses over n boolean variables. 
Denote the clauses of ^ as Ci, C 2 , . . . , and the variables as xi, CC 2 , . . . a;„. By 
definition, each clause contains at most 2 literals. Without loss of generality, we 
assume that a variable appears at most once in a clause. 

We transform the instance <P{n, k) into a SNP matrix with a set T of 
n{nk-\-ik-\-l)-\-‘i^k fragments (rows), and S of 2n-|-5fc SNPs (columns). See Fig.El 
Each variable x contributes nfc -I- 3fc -I- 1 fragments, which can be partitioned 
into 3 sets with k, k, {nk -I- fc -|- 1) fragments respectively. The fragments with 
labels fx,i, ■ ■ ■ , fx,k form the set T(x)(true). Similarly, the set F(cc)(False) is the 
set containing fragments fx^i, • . ■ , fx,k, and the set S{x) (support) contains the 
remaining nk -\- k -\- 1 fragments. No two fragments from different sets can be in 
the same haplotype, and any number of fragments from the same set can be in 
the same haplotype. Denote a literal of the variable x as xi € {x^x}. If xi = x, 
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Fig. 3. Gadget for the reduction. 



then the fragment fxi,i corresponds to fx,i, and fxi,i = fx,i- Similarly, if xi = x, 

fxi,i — fx,i'f and fxi,i — fx^i- 

Next, consider a clause Ci = {x + y). There are three other fragments 
Cyi) <^1,2, C'i.a for each clause. The clause fragments are located in between the 
fragments for x and y, Ci^i and Ci^2 conflict with each other. Extend fragment 
fx,i with a mate-pair, so that it shares a SNP (and a conflict) with the clause 
fragment Ci^i. Likewise, extend fy i to conflict with C'j^2- Finally the fragment 
Ci^3 is a mate-pair which conflict with both and /^y. Denote the fragment 
conflict graph on M^, as G{M,^) (for simplicity, we drop the subscript J^). 

Lemma 8. Given a clause Gi = {xi + yi) of a 2 SAT instance <P, the fragments 
{Gi,i, Cy2, fyi,i, Gi^s, fx,,i) form a chordless cycle in G{M,p). 

Lemma 9. Each fragment in has at most one gap. 

Lemma 10. Let K he a set of fragments (rows) in . The following are suf- 
ficient conditions for M^[K] to be feasible. 

1 . For every variable x, K C\ F{x) = {}, or K C\ T{x) = {}. 

2 . For every clause Gi = {xi yi), K does not contain all the four fragments 
fxi.ij fyiji'f Gi l, and Gi 2- 

Proof. The proof is constructive. If G{M,p[K]) is bipartite, its nodes can be par- 
titioned into two independent sets (shores) Ki, and K2. We employ the following 
construction. 

(1) For all X, add S{x) to Ki, and T{x) (or, F{x)) to K2. (2) For all clauses 
G^ = (xi-i-yi), add G^^3 to Ki. (3) For all clauses Gi = {xi-i-yi): (a) if {fx^.i ^ F), 
add Cyi to K2, and Cy2 to Ki. (b) else if (/yjy ^ F), add Cy2 to K2, and Cyi 
to Ki. (c) else add Cy2 (or, Cyi) to Ki. 
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We need to show that the graphs induced by K\ and K 2 are both independent 
sets. Note that 5'(a;) in Ki only has edges to T(x) and F{x) which are in K 2 - 
Likewise for all i, in Ki only has edges to nodes in K 2 - If both Ci^i and 
Ci ^2 are present, then the condition ensures that both and are not in 
K. Therefore, the construction in 3a, and 3b ensures that and Cp 2 are put 
on different shores. 

Next, consider the fragments in T{x), and F{x) for all x. Condition 1 ensures 
that they can be all placed in K 2 without conflicting edges between different 
literals of x. Next, from 3a, Cpi is placed in K 2 only if is not in K. From 
3b, Ci ^2 is placed in K 2 only if /y,p is not in K. Thus, K\ and K 2 induce 
independent sets, and is feasible. 

Lemma 11. An optimal solution to the MFR problem on M,p has at most nk+k 
fragments. 

Proof. Consider a set of fragments R with F{x) for all x, and Qp, for all clauses 
Ci . Removing R satisfies the conditions of lemma uni implying that R is a, solu- 
tion to the MFR problem on with nk+k fragments. 

Lemma 12. Let R be an optimal solution to the MFR problem on M^. Then, 
R n S{x) = 4> for all x. 

Proof. Consider an optimal solution R that contains a fragment / from S{x), 
for an arbitrary variable x. Let K = F — R. As R is optimal, adding / to 
G{M 4 ,[K]) must create an odd-cycle C. Consider any other fragment /' S S{x). 
By construction, C — {/} -I- {/'} is also an odd-cycle. This implies that all 
fragments in S{x) are in R. Therefore, \R\ > |5'(a:)| = nk + k + 1 > nk + k, a 
contradiction to lemma mi 

Lemma 13. Let R be an optimal solution to the MFR problem for M^. Then, 
for all X, either T{x) C R, or F{x) C R. 

Proof. Consider an optimal solution R with a variable x, and fragments /i G 
T{x) — R, and /2 S F{x) — R. By lemma IT^ there is a fragment f G S{x) — R. 
By construction, /, /i, and /2 form an odd cycle, a contradiction! 

Theorem 7. Consider a Max2SAT instance <!> with n variables and k clauses, 
and the associated SNP matrix M^. k' <k clauses of<P are satisfiable if and only 
if there exists a solution to the MFR problem for M,^ with nk + k — k' fragments. 

Proof. Consider an assignment of variables satisfying k' clauses. For each vari- 
able X set to TRUE, add all the fragments in F{x) to R, and for every fragment 
set to FALSE, add all the fragments in T{x) to R. Next, consider all the clauses 
that are satisfied. If Ci = {xi +yi) is satisfied, at least one of fxi,i, and /y,p is in 
R, breaking the odd cycle, and we do nothing. If Ci is not satisfied, we add Cpi 
to R. The number of fragments in R due to variables is nk, and the number of 
fragments in R due to clauses is k — k' . By lemma fTTl M,p\F — R] is feasible. 

Next, consider an optimal solution R of the MFR problem on M<p with nk + 
k — k' fragments. For every x, by lemma FH! either F{x) C i? or T{x) C R. 
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If F{x) C R set X to TRUE. Otherwise set x to FALSE. We need to show 
that exactly k' clauses are satisfied. Note that a set D of nk nodes must be in 
any optimal solution R to the MFR problem (lemma Further, each clause 
is associated with 5 fragments that induce an odd cycle in the conflict graph 
(lemmalHl). At least one of these fragments must be in R. If a clause Ci = {xi+yi) 
is satisfied, then this fragment can be attributed to the set D. If however, Ci is 
not satisfied, the number of fragments in R increases by at least one. Thus if the 
total number of clauses satified is k" , then |i?| > nk + k — k" . If k" < k' , then 
|i?| > nk + k — k\ a contradiction. On the other hand, if k" > k, then by earlier 
argument, there is a solution to the MFR problem with nk + k — k” < nk+k — k' 
fragments, which is a contradiction to optimality. 

We close this section with a complexity result for the snip removal problem, 
which follows using lemma 0 and the fact that MAXCUT is NP-hard for 3- 
regular graphs. 

Theorem 8. The MSR problem is NP-hard for SNP matrices with at most 2 
gaps per fragment. 
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Abstract. We consider the classical problem of scheduling a set of in- 
dependent jobs on a set of unrelated machines with costs. We are given a 
set of n monoprocessor jobs and m machines where each job is to be pro- 
cessed without preemptions. Executing job j on machine i requires time 
Pij > 0 and incurs cost dj. Our objective is to find a schedule obtaining 
a tradeoff between the makespan and the total cost. We focus on the 
case where the number of machines is a hxed constant, and we propose a 
simple FPTAS that computes for any e > 0 a schedule with makespan at 
most (l-|-e)r and cost at most CoptiT), in time 0(n(n/e)™'), given that 
there exists a schedule of makespan T, where Copt{T) is the cost of the 
minimum cost schedule which achieves a makespan of T. We show that 
the optimal makespan-cost trade-off (Pareto) curve can be approximated 
by an efficient polynomial time algorithm within any desired accuracy. 
Our results can also be applied to the scheduling problem where the re- 
jection of jobs is allowed. Each job has a penalty associated to it, and 
one is allowed to schedule any subset of jobs. In this case the goal is 
the minimization of the makespan of the scheduled jobs and the total 
penalty of the rejected jobs. 



1 Introduction 

We consider the problem of scheduling n independent jobs on m unrelated paral- 
lel machines. When job j is processed on machine i it requires pij > 0 time units 
and incurs a cost c^. Our objective is to find a schedule obtaining a trade-off 
between the makespan and the total cost. This kind of problems has many ap- 
plications in vehicle routing m, distribution systems P| , facility location 
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and in optical networks that are based on the Wavelength Division Multiplexing 
(WDM) technology [I . Notice that this problem covers as a special case the 
problem of scheduling jobs when rejections are allowed 0. In this problem, a 
job j can either be rejected, in which case a penalty Cj is paid, or scheduled on 
one of the machines |2|. 



In the last ten years a large number of works deal with different variants of 
the general bi-criteria problem on unrelated parallel machines |14I17I12I18| . Lin 
and Vitter PI gave a polynomial time algorithm that, given makespan T, cost 
C, and e > 0, finds a solution of makespan (2 + f)T and cost (1 -h e)C, if 
there exists a schedule of makespan T and schedule C. This result is based on 
solving a linear relaxation of a particular integer programming formulation, and 
then rounding the fractional solution to a nearby integer solution by using the 
rounding theorem of Lenstra, Shmoys and Tardos M- Shmoys and Tardos PI 
improved this result by presenting a polynomial-time algorithm that, given T 
and C finds a schedule of cost C and makespan 2T, if there is a schedule of 
makespan T and cost C. The main difference with the previous result is the 
introduction of a new rounding technique of the fractional solution which does 
not require that the solution to be rounded be a vertex of the linear relaxation. 
This result cannot be substantially improved in the case where the number of 
machines is in the input of the problem, since Lenstra et al. m proved that for 
the single-criterion problem of minimizing the makespan, no polynomial-time 
(1 -I- e)-approximation algorithm with e < 1/2 exists, unless V = J\fV. 



Hence, many papers deal with the natural question of how well the problem 
can be approximated when there is only a constant number of machines 1101121 . 
For the bi-criterion scheduling problem that we consider in this paper, Jansen 
and Porkolab m proposed a fully polynomial-time approximation scheme (FP- 
TAS) that given values T and C computes for any e > 0, a schedule in time 
0(n(m/e)‘^*^’”^) with makespan at most (1 J- e)T and cost at most (1 -I- e)C, if 
there exists a schedule of makespan T and cost C . This result relies in part on lin- 
ear programming and uses quite sophisticated methods such as the logarithmic 
potential price directive decomposition method of Grigoriadis and Khatchiyan 

ra- 
in this paper, we propose a FPTAS that, given T, computes for any e > 0 a 
schedule with makespan T' such that T' < (1 J- e)T and cost at most Copt{T), 
in 0(n(n/e)’”) time, with Copt{T) the cost of the minimum cost schedule which 
achieves a makespan of T. 

The algorithm of Jansen and Porkolab m, given T and C, can only guarantee 
that the obtained cost is at most (1 + e)C while our result guarantees a cost 
(at the worst case) equal to C, while offering the same quality in what concerns 
the makespan, i.e. a schedule of makespan at most (1 + e)T. In addition, our 
methods are much more simpler since we use just a combination of well known 
combinatorial methods (dynamic programming and rounding) and we require 
only the knowledge of T. On the other hand, the complexity of our algorithm is 
worse than the one of m 
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In the following table, we give a comparison of our result with the most important 
results, for the unrelated parallel machines scheduling problem with costs. 



m general 


m constant 


Reference 


Q3 


m 


m 


this paper 


Quality 


(2+i)T,(l + e)C 


2T,C 


(1 + e)T, (1 + e)C 


{l + e)T,C 


Time 


polyijn, n) 


polyijn, n) 


n(f 


0{nif)n 



Furthermore, we have adapted our algorithm for finding the schedule with the 
optimal makespan Tgpt (within a factor 1 + e) and the smallest total cost (with 
respect to Topt) without knowing the value of Topt, i.e. a schedule s* such that 
Topt < T{s*) < (1 + e)Topt and C{s*) < Copt{Topt), where Topt is the optimal 
makespan and Copt (Topt) is the minimal cost of a schedule with makespan Topt- 
The complexity of our algorithm in this case is in 0((^)"®+i logm). 

Pareto curves and approximation. Usually, given an optimization problem 
we are searching for a feasible solution optimizing its objective function. In mul- 
tiobjective optimization, we are interested not in a single optimal solution, but 
in the set of all possible solutions whose vector of the various objective crite- 
ria is not dominated by the vector of another solution. This set known as the 
Pareto curve captures the intuitive notion of “trade-off” between the various 
objective criteria. For even the simplest problems (matching, minimum span- 
ning tree, shortest path) and even for two objectives, determining whether a 
point belongs to the Pareto curve is an AfP-hard problem. Moreover, this set 
is exponential in size and so, until recently all computational approaches to 
multiobjective optimization are concerned with less ambitious goals, such as op- 
timizing lexicographically the various criteria. Recently, an approximate version 
of this concept has been studied, the e-approximate Pareto curve |E|. Informally, 
an e-approximate Pareto curve is a set of solutions that approximately dominate 
all other solutions, i.e. for every other solution, the set contains a solution that 
is at least as good approximately (within a factor 1 -|- e) in all objectives. 

It has been shown in uni, that under some very general conditions there is 
always a polynomially (in the size of the instance and -) succinct e-approximate 
Pareto curve. For discrete optimization problems with linear objective functions, 
it has been also shown that there is a fully polynomial time approximation 
scheme (FPTAS) for calculating the e-approximate Pareto curve if the exact 
version of the problem is pseudo-polynomial time solvable nn]. A linear objective 
function means that the criterion can be expressed as a scalar product between 
the solution (represented by a vector) and a fixed vector of costs. This result 
can be applied on the problem we consider by embedding the makespan problem 
(which is the maximum of a fixed number of machine loads, i.e., the maximum of 
a fixed number of linear functions) into the multi-criteria problem where every 
machine load is a criterion and the total cost is a criterion. 

We study in this paper a stronger version of approximate Pareto curves, the 
(e, 0)-Pareto curve and give an efficient polynomial time algorithm to construct 
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it. Informally, we search for a set of solutions that dominates all other solutions 
approximately (within a factor 1 + e) in all but one objectives, and it is optimal 
with respect to the last objective. For the scheduling problem that we consider, 
the solutions that belong to the (e, 0)-Pareto curve approximately dominate all 
other solutions with respect to the makespan criterion, but they must give the 
smallest possible value of the total cost. In this paper, we propose a FPTAS for 
constructing an (e, 0)-Pareto curve which runs in time 0(^(n/e)™). 

Scheduling with rejections. In this kind of problems, for each job we have 
to decide whether to accept that job or whether to reject it. For the accepted 
jobs we pay the makespan of the constructed schedule and for the rejected jobs 
we pay the corresponding rejection penalties. The objective is to find a feasible 
schedule minimizing the sum of the makespan of the accepted jobs plus the total 
penalty. Bartal et al. P| have considered the online and offline versions of this 
problem, in the case of identical machines i.e. when the processing times of the 
jobs are machine independent. Epstein and Sgall have considered the problem in 
the case of uniformly related machines More recently, Hoogeveen, Skutella 
and Woeginger considered the preemptive version of the unrelated machines case 
and presented a complete classification of the different variants of the problem 
0. It is easy to see that the results presented in this paper can be applied to 
the non-preemptive scheduling problem with rejections in the case of unrelated 
parallel machines. To take into account rejection, it is sufficient to add a dummy 
machine m + 1, such that the processing times of all the jobs j are zero on this 
machine and Cm+i,j = Cj. In fact, this machine receives all the rejected jobs. 
The cost of every job j on the other machines is equal to zero. Therefore, we can 
obtain a FPTAS for constructing an (e,0)-Pareto curve in time 0( — 
for scheduling with rejection on unrelated parallel machines. This curve contains 
a set of solutions dominating all others solutions approximately (within a factor 
1 + e) in the makespan criteria and it is optimal with respect to the total penalty 
criteria. 

Organization of the paper. In the next section we present an exact dynamic 
programming algorithm while in Section 3 we use it in order to obtain a FPTAS 
in the case where the value of the makespan is given. We show also how to 
modify this FPTAS in order to compute the optimal makespan (within a factor 
1 + e) at the minimum cost. Section 4 is devoted to the construction of a FPTAS 
approximating the Pareto curve. In the sequel, we shall consider the case of two 
machines. The generalization for m > 2 machines, with m fixed, is direct. 



2 An Exact Dynamic Programming Algorithm 

For a schedule s, possibly partial, i.e. involving only a subset of all the jobs, 
we denote by C(s) and T{s) its cost and makespan, respectively. We shall also 
use the notation s = {mi, m 2 ) = (Ji, J 2 ) meaning that mi (resp. m 2 ) is the 
total processing time on machine 1 (resp. 2), and Ji (resp. J 2 ) is the set of 
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jobs scheduled on machine 1 (resp. 2). If s = (Ji, J 2 ) then the makespan is 
defined by T{s) = max{mi = ''^2 = £^nd the cost by 

C'(s) = cij + J2jej2 '^21- 

We define Copt{T) as the minimum cost schedule, if any, with makespan T, i.e. 
Copt{T) = mms^T{s)^TC{s). Otherwise, Copt{T) = + 00 . 

In this section, we give a pseudo-polynomial time algorithm, which given T 
returns a schedule s with makespan T{s) = T and cost C(s) = Copt{T) if such 
a schedule exists. 

Let J = {1, 2, . . . , n} be the set of jobs. We shall assume, in this section only, 
that the processing times pij are integer ones. 

The states of the dynamic programming algorithm are the tuples if(j, toi, m 2 ), 
with 1 < j < n and mi, m 2 G {0, 1, . . . , T}. 

By definition Ec{j,rni,m 2 ) = -koo if a schedule involving jobs {1,2, . . . j} and 
which achieves a completion time equal to mi on machine 1, and m 2 on machine 
2, does not exist. Otherwise, Ec{j,mi,m 2 ) is the minimum cost of a schedule 
among all such schedules. We have 



Ec{j, mi, m2) = mm{Ec{j - 1 , mi -pi^ , m2) -k cij,Ec{j - 1 , mi, m2 ~P 2 j) + C 2 j}. 



Initially, we set ifc(l,0,p2i) = C 21 , ifc(l,Pii,0) = cn, and otherwise 
ifc(l,mi,m 2 ) = -koo. 

To be able to retrieve the schedule, we need an additional variable 
Es{j,mi,m2) which stores, for each state, the partial schedule constructed so 
far, i.e. the jobs that are scheduled on each machine. For example, if we have 
Ec{j,mi,m 2 ) = Ec{j — l,mi—pij,m 2 ) + cij, it means that job j is scheduled on 
machine 1, and therefore we have Eg{j, mi, m 2 ) = Eg(j — 1, mi —pij, m 2 ) U (j — >■ 
!}• 

The above backward recursion can be turned into an efficient to implement 
forward recursion. The algorithm is the following one: 

for j = 1 to n — 1 

for mi = 0 to T by step of 1 (mi is always an integer) 
for m 2 = 0 to T by step of 1 (m 2 is always an integer) 

Ec{j + l,mi -kpij,m2) ^ min{Ec{j + l,mi +pij,m2), Ec{j,mi,m2) + cij} 
Ec{j + I, mi, m 2 +P2j) ^ min{Ec{j + I, mi, m 2 +P2j), Ec{j,mi,m2) + C2j} 

Initially, we set ifc(l,0,p2i) = C 21 , Ec{l,pn,0) = cn, and otherwise 
Ec{j,mi,m 2 ) = -koo for j > 1. 

To obtain a solution with a makespan of T and cost Copt{T), we select all the 
states E{n,T,m2) and E{n,mi,T) if any, with mi, m2 G (0,1,... ,T}, and 
among those states we choose a state with minimum cost Ec- 
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The number of states is n{T + 1)^ and therefore this algorithm is only pseudo- 
polynomial. In the next section, we show how to obtain a fully polynomial time 
approximation scheme from it. 



3 An Approximation Scheme Given the Valne of the 
Makespan 

Let us consider 0 < e < 1 such that without lost of generality 1/e is an integer. 
The technique that we use, in order to obtain a fully polynomial time approxi- 
mation scheme, is inspired by the classical method of Horowitz and Sahni m- 

We subdivide the interval ]0, (1 -I- 2e)T] into n/e + 2n subintervals, Ik =](fc ~ 
1)Z, kl], 1 < k < n/e + 2n, each one of length I = eT/n. 

For a real x g] 0, (1 -I- 2e)T], we note \x] the right extremity of the subinterval in 
which X belongs, i.e. [a:] = kl, with k such that x G Ik- We set [0] = 0. 

The approximation algorithm consists in running the previous exact dynamic 
programming algorithm on the modified instance where each processing time pij 
is replaced by [pij]- The first for loop is replaced by “for toi = 0 to (1 -I- 2e)T 
by step of F, idem for m 2 - Later we will explain how to find the appropriate 
solution among all the computed states. 

For a set of jobs J' C J and a machine m S {1,2}, we shall note [J' — >■ to] = 

Definition 1. We note [T](s) the makespan of the schedule s assuming that the 
processing time of each job pij has been rounded up to [pij]- Under the same 
assumption, we note [s] = (toi,TO 2 ) the total processing time on each machine 
for the solution s- In other words, if s = (Ji, J 2 ) then [s] = ([Ji — >■ 1], [J 2 — ?> 2]). 

Proposition 1. For any schedule s, we have 0 < [T](s) — T{s) < eT- 

Proof Let s = (Ji, J 2 ) = {'mi, m 2 ) be the initial schedule, and [s] = (Ji, J 2 ) = 
(to},TO 2 ) the schedule obtained after rounding up the processing time of each 
job. Let us assume that T{s) = m\- Then, we can have either [T](s) = to} or 
[T](s) = to}. Let us assume first we have [T]{s) = to}. We have T{s) = m\ = 
J2j£jiPij [^K'®) = ■'^1 ~ 12j£ji[Pij]- Notice now, that since I is the length 
of each subinterval, for any a: > 0 we have [x] — x < I- Since jJij < n, we 
have [T](s) — T{s) < nl = eT- Let us assume now we have [T](s) = to}. Since 
T{s) = TOi, we have toi > m 2 - So [T](s) — T(s) = to} — toi < to} — TO 2 < nl = eT 
(with the same argument as above). 

Definition 2. We define by [(TOi,TO 2 )]fc the set of all schedules s involving the 
jobs (1,2,... , k}, such that [T](s) = {mi, m 2 )- 
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Proposition 2. Let k G {I,-- - ,n} and let s = Es{k,mi,m 2 ) the (partial, if 
k < n) sehedule eomputed hy the dynamic programming algorithm, then Vs G 
[(mi,m 2 )]/c we have C{s) < C(s). 

Proof. Consider the simplified search graph of the dynamic programming de- 
picted in Figure ^ A directed path of length I from the node r in this search 
graph corresponds to a set of decisions involving jobs {1,2,... , 1} in that order, 
which consist in choosing for each job the machine where it is scheduled. 

r 



states — 
states 

states E{k,.,.) 

Fig. 1. The search graph of the dynamic programming with merging of states 




Notice now that if we do not merge any state, then there is always a path from 
the node r of the search space to any solution s. If we consider an arbitrary 
schedule s, we can reorder the jobs on each machine such that the index of the 
jobs on each machine is in increasing order, without changing the makespan and 
the cost of the schedule. It is now clear that there is a path on the search graph 
from r, which leads to this new schedule which is equivalent to schedule s. 

The dynamic programming algorithm at each stage computes a set of states. 
Namely, during the Z — 1-th stage, the states E{1 — 1, mi, m 2 ) are computed, for all 
the values of mi and m 2 which are feasible. Then, the l-th job is considered and 
“added” to each of the states E{1 — 1, ., .) to obtain the states E{1 , ., .). During 
this stage, some states among the states E{1,.,.) are merged; the one which 
has the lowest cost remains, whereas the others disappear. If the statement of 
PropositionOis not true, then there is a state that was wrongly eliminated. Let us 
assume that the two states E{l—l,a,f3) = {J\,J 2 ) and A(Z — 1, a', /3') = (^{,^ 2 ) 
have been merged when the job I has been added in the first state, say on machine 
1, and in the second state on machine 2. This means that [Ji — >■ 1] -I- [{?} — >■ 
1] = [J[ 1] and [J 2 ^ 2] = [4 ^ 2] -k [j^} ^ 2]. Let si = (Ji U {1},J2), 

and S 2 = (^ 1,^2 1^}). Assuming that we have C(si) < C(s 2 ), the first state 

remained, whereas the second one was forgotten. Now let us suppose that because 
the partial schedule S 2 has been eliminated, there does not exist anymore a path 
in the search graph which leads to solution s. This is not a problem, since in 
that case there is a path which leads from si to a better solution s' than s. 
Let s = (J", J 2 ) = {mi, m 2 ). Since there was a path from S 2 to s, the schedule 
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s' = ( Ji U {/} U ( J" \ J'i),J2 U {J2 \ ( J 2 U {^}))) obtained from si is well defined. 
Then clearly we have s' = (mi, m 2 ) and C{s') < C{s). If s' is not reachable from 
Si because of subsequent mergings, the same reasoning shows that there exist 
a solution reachable from si which has a lower cost than s' and has the same 
makespan as s. 

Theorem 1. Given T > 0 and e > 0, if there exist a sehedule of makespan 
between T and (1 + e)T, the schedule Sdyna returned by the dynamic program- 
ming algorithm verifies (1 — e)T < T(sdyna) < (1 + 2e)T and C{sdyna) < 
min(g[7’ (i_|_g)7’] Copt (t) < Copt{T). Moreover the running time is C(n(n/e)’"). 
For the problem with rejections allowed, the running time is C(n(n/e)™“'"^). 




Proof. Let Sopt = {1^1, '^0,2) be a schedule with makespan between T and 
(1 + e)T and cost mintg[T,(i+e)T] C'opt(t)- Let [sopt] = {m'^,m' 2 ) and let Sd = 
Es{n,m'i,m' 2 ). We have Sopt S [(m'i,m 2 )]„, and so using Proposition El we 
obtain C{sd) < C{sopt) = Copt{T). We have T < T{sopt) < (1 + <^)T, 
and using Proposition 0 we have T < [T]{sopt) < (1 + 2e)T. By definition 
of Sd we have [T](sd) = [T](sopt), and using Proposition [D again we obtain 
{l-e)T<T{sd) < (l + 2e)T. 

To obtain the desired schedule Sdynaj we examine all the states returned by 
the dynamic programming algorithm with a rounded makespan between T and 
(1 + 2e)T, and we retain among them the schedule Sdyna which achieves the 
minimum cost. We unround the solution we return, i.e. each processing time is 
Pij instead of [pij]. Of course we will find the schedule Sd, but perhaps a better 
schedule, and so we have C{sdyna) < C{sd). The result follows, and the number 
of states that we need to consider is at most ninje + 2n)^. 

Remark 1. The case T = 0 can be easily solved, by assigning each job to a ma- 
chine where its processing time is zero and on which the job incurs the minimum 
cost. If such a machine cannot be found for each job, then obviously the optimal 
makespan Top* is strictly greater than 0. 
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Remark 2. Notice that making the rounding at the beginning of the dynamic 
programming algorithm is equivalent to making the rounding during the execu- 
tion of the algorithm since we have the following simple proposition. 



Proposition 3. Let pi, . . . ,Pn be any positive real numbers, then we have 

[■ ■ ■ [[[Pl] +P2] +P3] + ■ • -Pn] = [Pl] +[P 2 ] + ■■■ [Pn]- 

Proof. We first show that [[pi] + P 2 ] = [pi] + [P 2 \- Let us assume that pi G 
andp 2 e Ifcs- Then [pi] = kil and [P 2 ] = ^ 2 ^ [Pi]+P 2 £ T/ci+fcai and [[pi]-l-p 2 ] = 
{ki + k 2 )l. The proof is now by induction on the depth (number of left brackets) 
of the formula. 

In the sequel we note Dyna{T, e) = Sdyna- 
Let Topt = mins T{s) over all the schedules s. 

An interesting question is to consider the case T = Topt and to determine whether 
it is necessary to know the value of Topt to obtain an approximate schedule. 
Theorem 0 shows that this is not the case. 

Let dj = mini Pi j the minimum processing time of job j, and D — dj. We 
assume that D > 0, otherwise the problem is trivial (see Remark 0). We have 
the following proposition. 

Proposition 4. We have D/m < Topt < D. 

Proof. To obtain a schedule s with a makespan smaller than D, it is sufficient 
to put each job on the machine where its processing time is the smallest. The 
lower bound corresponds to the ideal case where the previous schedule s is such 
that each machine has the same load. 



Theorem 2. Without knowing the value Topt, we ean obtain, in 
0((n/e)"*+^ logm) time, a schedule s* such that Topt < T{s*) < {l + e)Topt and 
C{s*) < CoptiTopt). 

Proof. Using Proposition 0 we know that the makespan Topt lies in the interval 
[D/m,D]. We subdivide the interval [D/m,D[ into k = [log m/ log \/l -I- e] = 
^( logm ) subintervals J\,... ,Jk, each one having a length y/T^Tf: 

times greater than its predecessor, with Ji = \D/m, \/l -I- eD/m\. Then we 
apply our dynamic programming algorithm with 0 < e' < — l on the 

left extremity Xi of each subinterval Ji, i = \, . . . ,k. Let Sa = Dyna{xa,e') 
and Sq,+i = Dyna{xa+i,e') the first two solutions returned by our dynamic 
programming algorithm, i.e Dyna{xi, e') = 0 for i = 1, . . . , a — 1. Then s* is the 
solution among Sa and Sa+i which has the minimum cost. To see this, consider 
the Figure 0 The value Topt lies in some subinterval Ja = [xa, \/l + eXo\. Let 
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Fig. 3. An illustration of the proof of Theorem [^| 

Sopt a minimum makespan schedule, T(sopt) = Topt - Then using PropositionQl we 
have [T](sopt) < Topt + e'T < {l + e')Topt (since T = Xa < Topt) < Vl + eTopt < 
(1 + e)xcn since Topt < Vl + Therefore either [T]{sopt) & Ja or [T](sopt) G 
Ja+i- Now Theorem in shows that C(s*) < CoptiTopt) and we have T(s*) < 
y/l + eXa(l + 2e^) = ydr+Vxct (2^/lT-j-V — 1) = 2a;Q,(l + e) — x^y/l + e < 2(1 + 
T)Topt — Topt = (1 + ‘2-f)Topt- Now using e/2 instead of e yields the result. 

4 A FPTAS for (e, 0)-Approximate Pareto Curves 

In this section, we consider the notion of approximate Pareto curves |S|: Given an 
instance a:; of a minimization multi-objective problem, with objective functions 
fi, i = I, . . . , k, and an e > 0, an e- approximate Pareto curve, denoted Pe{x), 
is a set of solutions s such that there is no other solution s' such that, for all 
s G Pe{x), (1 + e)fi{x, s') < fi{x, s) for some i. 

Our goal is to get a stronger result for our bi-criteria problem: Informally, we 
search for a set of solutions that approximately dominate all other solutions 
in what concerns the makespan criterion, and absolutely dominates all other 
solutions in what concerns the cost criterion. In other words, for every other 
solution T{s'), C(s'), the set contains a solution s such that T(s) < (1 + e)T(s') 
and C{s) < C{s'). In order to do so, we introduce the definition of an (e, 0)- 
approximate Pareto curve: Given an instance a; of a bi-criterion minimization 
problem, an (e, 0)- approximate Pareto curve, denoted P^fi{x), is a set of solutions 
s such that there is no other solution s' such that, for all s G Pop{x): (1 -I- 
e)T{s',x) < T{s,x) or C{s',x) < C{s,x). 

A solution s is said to {e,0)~ approximately dominate a solution s' if T{s,x) < 
T{s' , a;)(I-|-e) and C{s) < C{s'). Equivalently s' is said to be (e, 0)- approximately 
dominated by s. In other words, an (e, 0)-approximate Pareto curve Pep{x) is 
a set of solutions such that any solution is (e, 0)-approximately dominated by 
some solution in Pe,o{x). 

In the following we give an algorithm which constructs in 0(n(-)’”“*'^) time an 
(e, 0)-approximate Pareto curve for the problem that we consider. 

The algorithm. Let us denote by Mj the machine on which job j has the 
minimum cost. Let Cmin = and D = Xj PMjj ■ Let Smin a schedule with 



204 E. Angel, E. Bampis, and A. Kononov 



cost Cmin- It is clcar that T{smin) < D. We subdivide the interval [D/m,D] 
into k = |’log(m£)/Z?)/log(l + e)] + 1 = 0{n/e) geometric subintervals. Then 
we apply the algorithm Dyna{T, e) on the left extremity of each subinterval and 
get a schedule. We also compute Dyna{D, e). Among the schedules obtained, we 
keep an undominated subset 77. 

Theorem 3. The set 77 is an {e, 0) -approximate Pareto curve, and can be ob- 
tained in time. For the problem with rejections allowed, the running 

time is . 

Proof. Let s a schedule with T(s) < D. Let tk < T{s) < tj.+i, where tk is the 
left extremity of the 7-th subinterval (we have ti = 7?/m and ^2 = (1 + e)D/m). 
By Theorem El there is s* G 77 such that T{s*) < (1 -I- ‘2e)tk < (1 + 2e)T(s) and 
C{s*) < C{s). If T{s) > D, then T{s) > T{smin) and C{s) > Cmin- We know 
that there is s* G 77 such that T(s*) < (1 -I- 2e)T(smin) < (1 + 2e)T(s) and 
C{s*) = C{smin) < C{s). So by definition of (e, 0)-approximate Pareto curve, 
we get that 77 is an (2e, 0)-approximate Pareto curve. Now using e/2 instead of 
e yields the result. 

Now notice that if 77 = 0 the Pareto curve is a single point, and if 77 = 0 we use 
Remark Qto obtain the first point of the Pareto curve, and we run the algorithm 
using the interval [min^ | , 77] , instead of [77/m, 77], to obtain the other 

points. 

Remark 3. Given the polynomially sized (e, 0)-approximate Pareto curve, it is 
easy to get a schedule that minimizes the sum of the makespan and the cost of 
rejected job, by inspecting all points on the Pareto curve and taking one which 
minimizes this sum. 
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Abstract. In this paper we describe a general grouping technique to 
devise faster and simpler approximation schemes for several scheduling 
problems. We illustrate the technique on two different scheduling prob- 
lems: scheduling on unrelated parallel machines with costs and the job 
shop scheduling problem. The time complexity of the resulting approxi- 
mation schemes is always linear in the number n of jobs, and the multi- 
plicative constant hidden in the 0{n) running time is reasonably small 
and independent of the error e. 

1 Introduction 

The problem of scheduling jobs on machines such that maximum completion 
time (makespan) is minimized has been extensively studied for various problem 
formulations. Recently, several polynomial time approximation schemes have 
been found for various shop and multiprocessor scheduling problems !1t2l7|b|1(ll 
CD: these include scheduling on unrelated machines, multiprocessor tasks (e.g. 
dedicated, parallel, malleable tasks), and classical open, flow and job shops. 
These results were extended in jO] by providing a polynomial time approximation 
scheme for a general multiprocessor job shop scheduling problem (containing as 
special cases the above problems). 

The goal of this paper is to show that a general and powerful technique 
can be applied to speed up and enormously simplify all the previous algorithms 
for the aforementioned scheduling problems. The basic idea is to reduce the 
number of jobs to a constant and to apply enumeration or dynamic programming 
afterwards. The reduced set of jobs is computed by first structuring the input 
and then grouping jobs that have the same structure. In order to show that our 
algorithm is an approximation scheme, we prove that two linear programming 
formulations (one for the original instance and one for the transformed instance) 
have a gap of at most eOPT, where OPT is the minimum objective value. 

* Supported by Swiss National Science Foundation project 21-55778.98, “Resource Al- 
location and Scheduling in Flexible Manufacturing Systems” , by the “Metaheuristics 
Network”, grant HPRN-CT-1999-00106, and by the “Thematic Network APPOL”, 
Approximation and on-line algorithms, grant IST-1999-14084. 
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Due to space limitations, we focus our attention on two scheduling problems: 
scheduling on unrelated machines with costs and the classical job shop schedul- 
ing problem. Although these two problems are different in nature (the first is 
an assignment problem while the other is an ordering problem), the reader will 
recognize that the underlying ideas are very similar. Furthermore, the described 
techniques can also be applied to many other scheduling problems including the 
general multiprocessor job shop scheduling problem studied in 0. The overall 
running time is always 0(n) -|- C, where the constant hidden in 0(n) is reason- 
ably small and independent of the accuracy e, whereas the additive constant C 
depends on the number of machines, the accuracy e (and the number of oper- 
ations for job shops). The full details of these extensions will be given in the 
long version of this paper. Note that the time complexity of these PTASs is best 
possible with respect to the number of jobs. Moreover, the existence of PTASs 
for these strongly NP-hard problems whose running time is also polynomial in 
the reciprocal value of the precision and the number of machines would imply 
P=NP 0. 

Scheduling unrelated parallel machines with costs. We begin with the problem of 
scheduling a set = { Ji, ..., of n independent jobs on a set M = {1, ..., m} 
of m unrelated parallel machines. Each machine can process at most one job 
at a time, and each job has to be processed without interruption by exactly 
one machine. Processing job Jj on machine i requires Pij > 0 time units and 
incurs a cost > 0, i = 1, . . . , m, j = 1, ...,n. The makespan is the maximum 
job completion time among all jobs. We consider the problem of minimizing the 
objective function that is a weighted sum of the makespan and total cost. 

When Cij = 0 the problem turns into the classical makespan minimization 
form. Lenstra, Shmoys and Tardos m gave a polynomial-time 2-approximation 
algorithm for this problem; and this is the currently known best approximation 
ratio achieved in polynomial time. They also proved that for any positive e < 1/2, 
no polynomial-time (1 -I- e)-approximation algorithm exists, unless P=NP. Fur- 
thermore, Shmoys and Tardos m gave a polynomial-time 2-approximation al- 
gorithm for the general variant with cost. Since the problem is NP-hard even 
for TO = 2, it is natural to ask how well the optimum can be approximated 
when there is only a constant number of machines. In contrast to the previ- 
ously mentioned inapproximability result for the general case, there exists a 
fully polynomial-time approximation scheme for the problem when to is fixed. 
Horowitz and Sahni jS] proved that for any e > 0, an e-approximate solution 
can be computed in 0{nm{nin/ time, which is polynomial in both n and 
1/e if TO is constant. Lenstra, Shmoys and Tardos \ I .'~i| also gave an approxi- 
mation scheme for the problem with running time bounded by the product of 
(n -|- 1)”*A and a polynomial of the input size. Even though for fixed to their 
algorithm is not fully polynomial, it has a much smaller space complexity than 
the one in 0. Recently, Jansen and Porkolab [Zj presented a fully polynomial- 
time approximation scheme for the problem whose running time is n{m/ 

Their algorithm has to solve at least (to^/c^)™ many linear programs. In order 
to obtain a linear running time for the case when to is fixed, they use the price- 
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directive decomposition method proposed by Grigoriadis and Khachiyan ^ for 
computing approximate solutions of block structured convex programs. The final 
ingredient is an intricate rounding technique based on the solution of a linear 
program and a partition of the job set. 

In contrast to the previous approach j7], our algorithm (that works also for 
the general variant with cost) is extremely simple: first we preprocess the data 
to obtain a new instance with min {n, (log } grouped jobs. The pre- 

processing step requires linear time. Then using dynamic programming we com- 
pute an approximate solution for the grouped jobs in (logm/e)^^™ ^ time. Both 
steps together imply a fully polynomial-time approximation scheme that runs in 
0{n) + C time where C = (logm/e)'^^™ \ We remark that the multiplicative 
constant hidden in the 0{n) running time of our algorithm is reasonably small 
and does not depend on the accuracy e. 

Makespan minimization in job shops. In the job shop scheduling problem, there 
is a set = { Ji, . . . , J„} of n jobs that must be processed on a given set M = 
{!,... ,m} of m machines. Each job Jj consists of a sequence of p operations 
Oij, 02j, ■ . ■ , Ofj_j that need to be processed in this order. Operation Oij must 
be processed without interruption on machine S M , during pij time units. 
Each machine can process at most one operation at a time, and each job may 
be processed by at most one machine at any time. For any given schedule, let 
Cij be the completion time of operation Oij. The objective is again to find a 
schedule that minimizes the makespan (the maximum completion time Cmax = 

maxy Cij). 

The job shop problem is strongly NP-hard even if each job has at most three 
operations and there are only two machines m Williamson et al. im proved 
that when the number of machines, jobs, and operations per job are part of 
the input there does not exist a polynomial time approximation algorithm with 
worst case bound smaller than | unless P = NP. When m and p are part of 
the input the best known result is an approximation algorithm with worst 
case bound 0{{log{mp)log{min{mp,pmax))/ ^oglog(rnp))^), where Pmax is the 
largest processing time among all operations. For those instances where m and 
p are fixed (the restricted case we are focusing on in this paper), Shmoys et al. 
m gave approximation algorithms that compute (2 -|- e)-approximate solutions 
in polynomial time for any fixed e > 0. This result has recently been improved 
by Jansen et al. nm who have shown that (1-1- £)-approximate solutions of the 
problem can be computed in polynomial time. The main idea is to divide the 
set of jobs J into two groups £ and S formed by jobs with “large” and “small” 
total processing time, respectively. The total number of large jobs is bounded 
by a constant exponentially in m, p and e. Then they construct all possible 
schedules for the large jobs. In any schedule for the large jobs, the starting and 
completion times of the jobs define a set of time intervals, into which the set of 
small jobs have to be scheduled. Then for every possible job ordering of large 
jobs, a linear program is used to assign small jobs to time intervals. The rounded 
solution of the linear program in combination with an algorithm by Sevastianov 
iia, gives an approximate schedule in time polynomial in n. In order to speed up 
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the whole algorithm, they suggest m a number of improvements: they use the 
logarithmic potential price decomposition method of Grigoriadis and Khachiyan 
0 to compute an approximate solution of the linear program in linear time, and 
a novel rounding procedure to bring down to a constant the number of fractional 
assignments in any solution of the linear program. The overall running time is 
0{n), where the multiplicative constant hidden in the 0{n) running time is 
exponentially in m, ^ and e. 

In contrast to the approximation schemes introduced in jl Oil 1 j , our algorithm 
is again extremely simple and even faster. We show that we can preprocess in 
linear time the input to obtain a new instance with a constant number of grouped 
jobs. This immediately gives a linear time approximation scheme with running 
time 0{n) + C, where C is a constant that depends on m, e and /r. Again, we 
remark that the multiplicative constant hidden in the 0{n) running time of our 
algorithm is reasonably small and does not depend on the accuracy e. 

Throughout this paper, we use several transformations which may potentially 
increase the objective function value by a factor of 1 + 0{e). Therefore we can 
perform a constant number of them while still staying within 1 + 0{e) of the 
original optimum. When we describe this type of transformation, we shall say it 
produces 1 + 0(e) loss. 

2 Scheduling Unrelated Parallel Machines with Costs 

Let 0 < £ < 1 be an arbitrary small rational number, and let m > 2 be an 
integral value. Throughout this section, the values e and m are considered to be 
constants and not part of the input. 

The problem can be stated by using the following integer linear program ILP 
that represents the problem of assigning jobs to machines {xij = 1 means that 
job Jj has been assigned to machine i, and /i is any given positive weight): 

mm T + ^ Ui=i Ui=i XijCij 

■s-i- J2j=i XtjPtj < T, i = 1, . . . , to; 

E ?71 1 • 1 

i=iXij =1, j = l,...,n; 

Xij G {0, 1} , f = 1, . . . , TO, j = 1, ..., n. 

The first set of constraints relates the makespan T to the processing time on 
each of the machines, while the second set ensures that every job gets assigned. 

We begin by computing some lower and upper bounds for the minimum 
objective value OPT. By multiplying each cost value by fi we may assume, 
w.l.o.g., that p= 1. Let dj = mini=i_...^m(Tij + Cij), and D = Consider 

an optimum assignment (x*j) of jobs to machines with makespan T* and total 

cost C*. Then, D = dj < E™ i EJ=i + 'ET=i EJ=i x*jPij < C* + 
m ■ T* < m ■ OPT, where OPT = T* + C* . On the other hand, we can generate 
a feasible schedule according to the dj values. Indeed, let rrij G {1, ..., to} denote 
any machine such that dj = Pmjj+Cmjj- Assign every job Jj to machine ruj. The 
objective value of this schedule can be bounded by ^j^j <^mj,j+^j^jPmj,j = D. 
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Therefore, OPT G [D/m, D], and by dividing all processing times and cost values 
by D/m, we get directly the following bounds for the optimum value: 

1 < OPT < TO. 

Fast, slow, cheap and expensive machines. For each job Jj, we define four sets 
of machines: the set Pj = {i : pij < of fast, the set Cj = {i : Cij < 

of cheap, the set Sj = {i : Pij > — dj} of slow and the set Ej = {i : Cij > dj/e} 
of expensive machines, respectively. Then, we can prove the following lemma. 

Lemma 1. With 1 + 3e loss, we can assume that for each job Jj the following 
holds: 



^ Pij = 0, for i G Pj, and Cij = 0, for i G Cj; 

— Pij = + 00 , for i G Sj, and Cij = +oo, for i G £j; 

— for any other machine i, pij = f^dj{l + e)'^* and Cij = ^dj(l + s)'^% where 
7Ti,7i G N. 



Proof. Set Pij = 0, for every i G Pj and j = 1, ..., n; consider an optimal assign- 
ment A\ J ^ M of jobs to machines. Clearly, the optimal value corresponding 
to A cannot be larger than OPT. Let F denote the set of jobs which are pro- 
cessed on fast machines according to A. Now, if we replace the processing times 
of the transformed instance by the original processing times, we may potentially 
increase the makespan of A by at most eF 
The other statement for the cost values follows in a similar way. 

Now we show that there exists an approximate schedule where jobs are 
scheduled neither on slow nor on expensive machines. This allows us to set 
Pij = -koo, for i G Sj, and = -koo, for i G £j. Consider an optimal as- 
signment A ■. J ^ M oi jobs to machines with T* and C* denoting the re- 
sulting makespan and cost, respectively. Let S and E represent, respectively, 
the set of jobs which are processed on slow and on expensive machines ac- 
cording to A. Then, assign every job Jj G S' U if to machine mj (recall that 
mj G {1, ..., to } denote any machine such that dj = Pmj,j + Cmj,j). Moving jobs 
Jj G S O E onto machines mj may potentially increase the objective value by 
at most < FT caw.j < sT* +sC*, since 

PA(j),j P Y'^j ^ '^A{j),j P dj/e for Jj G E. 

By the above arguments, all the positive costs and positive processing 
times Pij are greater than -^dj. Round every positive processing time pij and 
every positive cost cij down to the nearest lower value of ^dj(l-ke)^, for /i G N, 
i = 1, ..., TO and j = 1, ...,n. Consider the optimal value of the rounded instance. 
Clearly, this value cannot be greater than OPT. It follows that by replacing the 
rounded values with the original ones we may increase each value by a factor 
1 -k e, and consequently, the solution value potentially increases by the same 
factor 1 -k £. □ 



We define the execution profile of a job Jj to be an m-tuple < tti, ...,TTm > 
such that Pij = fj^dj{l + eYG We adopt the convention that 7Tj = -koo if pij = 
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+c», and TTi = — c» if ptj = 0. Likewise, we define the cost profile of a job Jj to 
be an m-tuple < 71 , 7 ^ > such that Cij = Again, we adopt the 

convention that 7 ^ = +00 if Cij = + 00 , and 7 ^ = —00 if = 0. Let us say that 
two jobs have the same profile iff they have the same execution and cost profile. 

Lemma 2. The number of distinct profiles is bounded bu i := 

[2 + 21ogi+,fl^™. 



Grouping jobs. Let <5 := ^, and partition the set of jobs in two subsets L = 
{Jj : dj > (5} and S = {Jj : dj < <5}. Let us say that L is the set of large jobs, 
while S the set of small jobs. We further partition the set of small jobs into 
subsets Si of jobs having the same profile, for * = 1, Let Ja and Jf, be two 
jobs from St such that da,db < <5/2. We “group” together these two jobs to 
form a composed job Jc in which the processing time (and cost) on machine i 
is equal to the sum of the processing times (and costs) of Ja and Jb on machine 
i, and let dc = da + db- We repeat this process, by using the modified set 
of jobs, until at most one job Jj from Si has dj < S/2. At the end, all jobs 
Jj in group Si have dj < S. The same procedure is performed for all other 
subsets Si. At the end of this process, there are at most i jobs, one for each 
subset Si, having dj < 6/2. All the other jobs, have processing times larger than 
6/2. Therefore, the number of jobs in the transformed instance is bounded by 
‘^ + £ < ^ + £ = (log to/£)‘^(™). N ote that the procedure runs in linear time, 
and a feasible schedule for the original set of jobs can be easily obtained from 
a feasible schedule for the grouped jobs. We motivate the described technique 
with the following 

Lemma 3. With 1 + e loss, the number of jobs can be reduced in linear time to 
be at most min{n, (logm/e)*^^'")}. 

Proof. Consider the transformed instance /' according to Lemma H] (small jobs 
are not yet grouped). Assume, w.l.o.g., that there exists a solution SOL' for /' 
of value OPT' . It is sufficient to show that, by using the small jobs grouped as 
described previously, there exists a schedule of value (1 + e)OPT' . 

Accordingly to SOL' we may assume that each machine executes the large 
jobs at the beginning of the schedule. Let c denote the total cost of large jobs 
when processed according to SOL' , and let L denote the time at which machine 
i finishes to process large jobs, for z = 1, ..., m. Now, consider the following linear 
program LPi: 

s.t. ti + Y,j.(zsXij^dj{l + s)"' < T, i=l,...,m; 

J2i=lXij = 1; Jj G S; 

Xij > 0, i = 1, . . . , TO, Jj G S. 

Note that LP\ formulates the integer relaxation of the original problem ILP for 
the subset of small jobs: we are assuming that machine i can start processing 
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small jobs only at time U, and when the processing times and costs are structured 
as in Lemma n 

For each S^, 4> = consider a set of decision variables y^i S [0, 1] for 

i = 1, The meaning of these variables is that y^i represents the fraction 

of jobs from 5'^ processed on machine i. Consider the following linear program 

LP2: 



min T + c + Y.T=i 

s-t- k + + < T, i = 



ytpi ^ O7 



i = 1 , . . . ,m, (j) = 



1, . . . ,m; 



By setting y^i = 



Xij dj 



it is easy to check that any feasible set of values 



(xij) for LPi gives a feasible set of values {y^fti) for LP 2 ; furthermore, by these 
settings, the objective function value of LP 2 is equal to that of LPi. But LPi 
is a relaxation of the original problem. Therefore, if we were able, by using the 
grouped small jobs, to get a feasible schedule of length at most 1 + e times 
the optimal value of LP 2 , we would be done. In the remainder we show how to 
generate such a schedule. This solution is obtained by using the solution of LP 2 
and the small grouped jobs. 

Let 2 /^. denote the values of variables y^^ according to the optimal solution of 
LP 2 . For every positive value schedule a subset of grouped jobs from on 
machine i until either (a) the jobs from are exhausted or (b) the total fraction 
of jobs assigned to i is equal to y’^^ (if necessary fractionalize one job to use up 
exactly). We repeat this for the not yet assigned grouped small jobs and for every 
positive value y^^. Note that if y^^ is not fractional, then the jobs from are 
not preempted by the previous algorithm. In general, the number of preempted 
jobs from is at most - 1, where = | > 0, i = 1, ..., mj |. Now 

remove all the preempted jobs Jj and schedule them at the end on machines 
rrij. This increases the makespan and the cost by at most A = S ■ X) 0 =i (/0 ~ 1)> 
since every grouped small job has cost plus processing time bounded by S when 
processed on machine rrij . A basic feasible solution of LP 2 has the property that 
the number of positive variables is at most the number of rows in the constraint 
matrix, m + i, therefore X^ 0 =i (/0 — 1) < m. In order to bound the total increase 
Z\ by e we have to choose <5 such that S < and the claim follows. □ 



By using a dynamic programming similar to that used in |3 it is possi- 
ble to compute a (1 J- £)-approximate schedule for the transformed instance in 
(log ^ time. We omit the details in this extended abstract, and observe 

that by Lemma 0 and by adopting this dynamic programming approach, we 
have the following 



Theorem 1. For the problem of minimizing the weighted sum of the cost and the 
makespan in scheduling n jobs on m unrelated machines (m fixed), there exists a 
fully polynomial time approximation scheme that runs in 0{n) + (log ^ 

time. 
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3 Makespan Minimization in Job Shops 

Let £ > 0 be an arbitrary small rational number, and let m > 2 and /i > 1 be 
integral values. Throughout this section, the values s, m and ^ are considered 
to be constants and not part of the input. For simplicity, we assume that 1/e is 
integral. We begin by providing some lower and upper bounds of the minimum 
makespan. Then we show how to reduce the number of jobs to a constant: the 
reduced set of jobs is computed in linear time by structuring the input and 
grouping jobs that share the same structure. This directly gives a linear time 
approximation scheme for the makespan minimization of the job shop scheduling 
problem. 

For a given instance of the job shop scheduling problem, the value of the 
optimum makespan will be denoted as OPT. Let dj = be the total 

processing time of job Jj G J, and let D = Clearly, D/m < OPT 

and a schedule of length at most D can be obtained by scheduling one job after 
the other. Then, we get directly the following bounds: ^ < OPT < D. By 
dividing all execution times pij by D /m, we assume, without loss of generality, 
that D/m = 1 and 



1 < OPT < TO. 

Negligible operations. Let us use Afj = : z = 1, ...,/i and Pij < to 

denote the set of negligible operations for job Jj, j = 1, ...,n. Then we have the 
following 

Lemma 4. With l + 2£ loss, we assume that for each job Jj the following holds: 
Pij — b? f^^ ^ij ^ : 

— Pij = ■jf^dj{l + eYf where tt^ G N and for Oij ^ Nj. 

Proof. Set Pij = 0, for every Oij G Mj and j = 1, ..., n. Clearly, the corresponding 
optimal makespan cannot be larger than OPT. Furthermore, if we replace the 
zero processing times of the negligible operations with the original processing 
times, we may potentially increase the makespan by at most ~jm^j ~ 

YTj=i ^ = ^- negligible operation Oij has processing 

times Pij greater than -^dj. Round each pij down to the nearest lower value of 
-j^dj{l + e)^, where /i G N and Oij ^ Afj. By replacing the rounded values with 
the original ones we may potentially increases the makespan by at most a factor 
of 1 + £. □ 

Now we define the profile of a job Jj to be a (/r • mj-tuple 
. . . ,7ri,m,7r2,i, . . . ,7r^,m) such that pij = ^dj(l + £)’"*•'' if rn^j = h. We 
adopt the convention that iTi^h = ~oo if Pij = 0 or if to^ yf h. 

Lemma 5. The number of distinct profiles is bounded by i := [2 + log]^_|_£ z*. 

TO^. 
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Grouping jobs. Let S := and partition the set of jobs in two 

subsets of large jobs L = {Jj : dj > S} and small jobs S = {Jj : dj < 6}. Again, 
we further partition the set of small jobs into subsets Si of jobs having the same 
profile, for i = 1, For each subset Si, we group jobs from Si as described 
for the unrelated parallel machines scheduling problem, but with the following 
difference: here grouping two jobs, and Jj,, means to form a composed job 
for which the processing time of the i-th operation is equal to the sum of the 
processing times of the i-th operations of Ja and J&. 

Lemma 6. With 1 -|- 2e loss, the number of jobs can be reduced in linear time 
to be at most min{n, ^ -|- tj. 

Proof. Consider the transformed instance /' according to Lemma 0 (small jobs 
are not yet grouped). Assume, w.l.o.g., that there exists a solution SOL' for /' 
of value OPT'. It is sufficient to show that, by using the small jobs grouped as 
described previously, there exists a schedule of value (1 -I- 2e)OPT' . 

Partition the set of large jobs into three subsets Li, L2, L3 as follows. Set p = 
^^4^2 and let a denote an integer defined later and such that a = 0, 1, ..., m/e—1. 
Set L is partitioned in the following three subsets: 

L\ = {Jj : mp°‘ < dj}, 

L 2 = [Jj '■ mp°‘^^ < dj < mp°‘}, 

^3 = {J 3 ■ nip™/® < dj < mp°‘~^^}. 

Note that each set size is bounded by a constant, and sets Li, L 2 , L3 and S 
establish a partition of all jobs. The number a can be chosen so that 

^ dj < e. (1) 

Jj^L2 

This is done as follows. Starting from a := 0, check each time whether the set 
corresponding to the current value of a satisfies inequality (d); if it is not the case, 
set a := a -|- 1 and repeat. Note that for different a-values, the corresponding 
L2 sets are disjoint. The total length of all jobs is m, and so there exists a value 
a' < mfe — 1 for which the corresponding set L 2 satisfies inequality CD- We set 
a := a' . 

In the following we consider an artificial situation that we use as a tool. Focus 
on solution SOL' , remove from SOL' all the jobs except those from L\. Clearly, 
this creates gaps in the resulting schedule a. The starting and finishing times 
of the operations from L\ divide the time into intervals: t\,t 2 , .■■,tg, where t\ 
is the interval whose right boundary is the starting time of the first operation 
according to tr, and tg is the interval with left boundary defined by the finishing 
time of the jobs from L\. Furthermore, let ly denote the length of interval ty, 
1 < < 5- It follows that OPT' = I^ote that the number g of intervals 

is bounded by 2p|Li|-|-l < 2/i/p“. Let G be the set of pairs (gaps of a) {v, i) such 
that no job from L\ is processed in interval v and by machine i, for v = 1, ...,g 
and i = 1, ...,m. 



Grouping Techniques for Scheduling Problems: Simpler and Faster 215 



Since the total length of jobs from L 2 is at most e, we can get rid of these 
jobs by assuming that they are processed at the end of the schedule one after 
the other; this increases the schedule by at most e. 

Consider the problem of placing jobs from L 3 U S into the gaps of a and 
such that the length of the resulting schedule is OPT' . In the following we first 
describe a linear program LPi which is a relaxation of this problem. Then we 
propose another linear program LP 2 which is a relaxation of LPi. By using the 
solution of LP 2 we show that the jobs from L3 and the grouped small jobs can 
be scheduled into the gaps by increasing the length of the gaps a little; we show 
that the total increase of the length can be bounded by e, and the claim follows. 

We formulate LPi as follows. Consider the set S of (not grouped) small jobs. 
For each job Jj & SVJ L3 we use a set of decision variables xyi- G [0, 1] for tuples 
r = (ri, . . . , r^) G A, where A = {(ri, ... , r^)|l < n < T2 < . . . < < g}. The 

meaning of these variables is that Xj^T represents the fraction of job Jj whose 
operations are processed according to r = (ri, . . . , r^), i.e., the i-th operation is 
scheduled in interval Tk for each 1 < k < fj,. Note that by the way in which we 
numbered the operations, any tuple (ri, . . . , r^) G A represents a valid ordering 
for the operations. Let the load on machine h in interval v be defined as the 
total processing time of operations from small jobs that are executed by machine 
h during interval s, i.e., Ly^n = Ej.gSuLs Efe=i.... ^ 3 ,rPkj- 

Let us write the load Ly^h as the sum of ^ where ^ is the load of 

the jobs from L3 while is the load of jobs from S. By Lemma 0, we have 

that = Y.rdA Efe=i.... Ej,-gS + e)'"''* ■ Then LPi is the 

following 

(t) ^ ^- 0 , (u, / i)gG; 

(2) X/tgA “ t, Jj G S'UL3; 

(3) Xjr >0, T € A, Jj € S U Ls. 

Constraint (1) ensures that the total length of operations assigned to gap {v, h) 
does not exceed the length of the interval, while constraint (2) ensures that job 
Jj is completely scheduled. 

Let Sfj denote the set of small jobs having the same profile, where 4> = 1, 

For each ((/) = we use a set of decision variables G [0, 1] for tuples 

T = (ti,... ,t^) G a. The meaning of these variables is that yj,y represents 
the fraction of jobs from Sj, whose operations are processed according to t = 
(ti, . . . , r^), i.e., the i-th operation is scheduled in interval Tk for each 1 < k < y. 

Let = ErGAEfe=i.....;.|r,=«E0=iy0rEj,GS.^ + Then 

LP 2 is the following 

(1) ^l,h + ^ {v, h) G G; 

(2) Tj G L3, 

(3) Y.reAy4>r =1. 4>= 

(4) Xjr >0, T £ A, Jj G S] 

(5) > 0, r G A, (/)= 1, ...,£. 
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By setting = 






it is easy to check that any feasible set of 



values {xj^r) for LPi gives a feasible set of values {y<j,T) for LP2. Since by con- 
struction a feasible solution for LPi exists, a feasible solution for LP2 exists as 
well. We show now that by using the optimal solution of LP2 we can derive a 
schedule without increasing too much the makespan. 

Let y^^ i^jr) denote the values of variables y^^ according to the optimal 
solution of LP2. For every positive value y^^, schedule a subset H^r of grouped 
jobs from on machine i until either (a) the jobs from are exhausted or 

(b) the total fraction (i.e. j°ds assigned to i is equal to (if 

necessary fractionalize one job to use up y*^^ exactly). We repeat this for the not 
yet assigned grouped small jobs and for every positive value y^^. Note that if 
is not fractional, then the jobs from are not preempted by the previous 
algorithm. In general, the number of preempted jobs from is at most — 1, 



where = | : y^r > 0, r G A| |. According to the optimal solution of LP2, 

let us say that job Jj G L3 is preempted if the corresponding (a;*.,.)-values are 
fractional. Let fj = | : x*^ > 0, r G A} |, for Jj G L3, then we have that the 

number of preemptions of job Jj G L3 is fj — 1. Therefore, the total number of 

preemptions is / = X^0=i(/0 ~ + eLsifj ~ ^"^d this gives also an upper 

bound on the number of preempted jobs. Now remove all the preempted jobs Jj 
from S' U L3, and schedule these set of jobs at the end of the schedule, one after 
the other. Since every job from S has a smaller total processing time than any job 
from L3, we can bound the total increase of the schedule length by Z\ = 

A basic feasible solution of LP2 has the property that the number of positive 
variables is at most the number of rows in the constraint matrix, mg + £ + l^sli 
therefore / < mg < 2mp,lp°^, and A — f ■ mp°-^^ < 2m? pp. By the previous 
algorithm, we have assigned all the jobs from S'U J3 to gaps with a total increase 
of the schedule length of 2m? pp. Now we consider the problem of schedule jobs 
from S' U J3 within each interval. This is simply a smaller instance of the job 
shop problem, and by using Sevastianov’s algorithm m it is possible to find a 
feasible schedule for each interval ty of length at most ly + p^m ■ mp°‘^^; this 
increases the total length by at most p^mg ■ mp°‘~^^ < 2m?p"^p. Therefore, the 
total increase is 2m? pp+2m? p'^ p < Am?p‘^p, and by setting p = ^^2^4 the claim 
follows. □ 



By the previous lemma a (1 -I- e)-approximate schedule can be obtained by 
finding the optimum schedule for the reduced set of jobs. It follows that: 

Theorem 2. There exists a linear time PTAS for the job shop scheduling prob- 
lem whose multiplicative constant hidden in the 0{n) running time is reasonably 
small and does not depend on the error e, whereas the additive constant is expo- 
nential in m, p and 1/e. 
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Abstract. In this paper, we consider a scheduling problem of vehicles 
on a path. Let G = {V,E) be a path, where V = {v\,V 2 , ■ ■ ■ ,Vn\ is its 
set of n vertices and E = {{vj,vj+i} | ji = 1, 2, . . . , n — 1} is its set of 
edges. There are m identical vehicles (1 < m < n). The travel times 
w{vj,Vjj^i) (= are associated with edges {vj,Vj+i} G E. 

Each job j which is located at each vertex Vj G V has release time rj 
and handling time hj . Any job must be served by exactly one vehicle. The 
problem asks to find an optimal schedule of m vehicles that minimizes 
the maximum completion time of all the jobs. The problem is known 
to be NP-hard for any fixed m > 2. In this paper, we give an Oirnv?) 
time 2-approximation algorithm to the problem, by using properties of 
optimal gapless schedules. 



1 Introduction 

In this paper, we consider a scheduling problem of vehicles on a path with 
release and handling times. The scheduling problem of vehicles, such as AGVs 
(automated guided vehicles), handling robots, buses, trucks and so forth, on a 
given road network is an important topic encountered in various applications. In 
particular, in FMS (flexible manufacturing system) environment, scheduling of 
the movement of AGVs, which carry materials and products between machining 
centers, has a vital effect on the system efficiency. 

The single- vehicle scheduling problem (VSP, for short) contains the traveling 
salesman problem (TSP) and the delivery man problem (DMP) Pj as its special 
cases. In the TSP, a salesman (a vehicle) visits each of n customers (jobs) situated 
at different locations on a given network before returning to the initial location. 
The objective is to minimize the tour length. In the DMP, the same scenario 
is considered but the objective is to minimize the total completion time of all 
the jobs. The VSP usually takes into account the time constraints of jobs (i.e., 
release, handling and/or due times), and therefore another important objective 
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functions, such as the tour time, the maximum completion time of jobs, the 
maximum lateness from the due times and so forth, are also considered. Since 
path and tree are important network topologies from both practical and graph 
theoretical views, VSPs on these networks have been studied in several papers, 
e.g., Psaraftis, Solomon, Magnanti and Kim m, Tsitsiklis ca, Averbakh and 
Berman |5|, Karuno, Nagamochi and Ibaraki blOllOllll . Nagamochi, Mochizuki 
and Ibaraki |iai3| . and Asano, Katoh and Kawashima p. 

The multi- vehicle scheduling problem (MVSP, for short) is a more general 
problem than the VSP. The MVSP on a general network can be viewed as a 
variant of the so-called vehicle routing problem with time windows (VRPTW) 
(e.g., see Desrosiers, Dumas, Solomon and Soumis |S|). The typical VRPTW 
uses vehicles with limited capacity, and all the vehicles start their routes from 
a common depot and return to the depot. The objective is to minimize the 
total tour length (or the total tour time) of all the vehicles under the minimum 
number of routes. Therefore, the MVSP may also appear in a variety of industrial 
and service sector application, e.g., bank deliveries, postal deliveries, industrial 
refuse collection, distribution of soft drinks, beer and gasoline, school bus routing, 
transportation of handicapped people and security patrol services. 

The multi-vehicle problem on a path to be discussed here is called PATH- 
MVSP, and the number of vehicles is denoted by m (1 < m < n). Problem 
PATH-MVSP asks to find an optimal schedule of m vehicles (i.e., their optimal 
sequences of jobs) that minimizes the maximum completion time of all the jobs. 
Note that the objective is equivalent to minimizing the maximum workload of 
all the vehicles. In Averbakh and Berman [41b] . dealing with an MTSP (i.e., 
multiple traveling salesmen problem) on a tree to minimize the maximum tour 
length of all the vehicles, they mentioned that such a minmax objective may be 
motivated, first, by the desire to distribute the workload to the vehicles in a fair 
way, second, by natural restrictions such as limited working day of the vehicles. 
They presented an approximation algorithm with the worst-case performance 
ratio 2 — 2/(m -|- 1) and with the running time 0(n"*“^) for the MTSP. They 
also considered an MDMP (i.e., multiple delivery men problem) on a path in a 
different paper by Averbakh and Berman |2| and showed that it can be solved 
in 0{n'^) time. 

The PATH-MVSP is NP-hard for any fixed m > 2, and is NP-hard in the 
strong sense for m arbitrary, since PARTITION and 3-PARTITION (e.g., see 
Garey and Johnson | 7 ]) are its special cases, respectively. The PATH-MVSP with 
m = 1 (i.e., the VSP on a path) was proved by Tsitsiklis to be NP-hard 
if the initial location of a vehicle is specified. The PATH-MVSP with to = 1 is 
2-approximable due to the results by Psaraftis et al. ini, and it was shown to 
be 1.5-approximable by Karuno et al. HH if the initial and goal locations of a 
vehicle are specified as one end of the path. 

In this paper, we consider the PATH-MVSP with symmetric edge weights. 
For a schedule of the problem, we refer to a maximal subpath of a given path 
which is traversed by a certain vehicle as its zone. A feasible schedule using m! 
vehicles (to' < to) is called a zone schedule if any two zones do not intersect and 
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thus there are m' — 1 edges which are not traversed by any vehicle. Such an edge 
that is not traversed by any vehicle is called a gap. Conversely, a schedule is called 
gapless if each edge is traversed at least once by some vehicle. When the fleet 
of vehicles follows a zone schedule, any two vehicles do not interfere each other 
on the given path. As such non-interference between the vehicles is important in 
controlling them, the zone schedule is often required in practice. Hence it would 
be natural to ask how much the maximum completion time becomes larger if 
we restrict ourselves only on zone schedules. In this paper, for the PATH-MVSP 
of finding an optimal gapless schedule (with symmetric edge weights), we prove 
that there always exists a zone schedule with the maximum completion time 
at most twice the optimal. Based on this fact, we present a polynomial time 
2-approximation algorithm to the problem. 

The remainder of this paper is organized as follows. After providing the math- 
ematical description of the PATH-MVSP in Section 2, we in Section 3 discuss 
how to construct a zone schedule whose maximum completion time is at most 
twice of the maximum completion time of any gapless schedule. In Section 4, 
by using a dynamic programming, we present an O(mn^) time 2-approximation 
algorithm for the PATH-MVSP. In Section 5, we give a conclusion. 

2 Multi-vehicle Scheduling Problem on a Path 

2.1 Problem Description 

Problem PATH-MVSP is formulated as follows. Let G = (V, E) he a, path net- 
work, where V = {vi,V 2 , . . . , w„} is a set of n vertices and E = {{vj,Vj+i} \ j = 
1, 2, . . . , n — 1} is a set of edges. In this paper, we assume that vertex v\ is the 
left end of G, and the right end of it. There is a job j at each vertex Vj G V. 
The job set is denoted by J = {j | j = 1, 2, . . . , n}. There are m vehicles on G 
(1 < m < n), which are assumed to be identical. Each job must be served by 
exactly one vehicle. 

The travel time of a vehicle is w{vj,Vjj^i) > 0 to traverse {vj,Vj+i} G E 
from Vj to Vj+i, and is w{vj+i,Vj) > 0 to traverse it in the opposite direction. 
Edge weight w{vj,Vj+i) for {vj,Vj+i} € E is called symmetric if w{vj,Vj+\) = 
w{vj+\,Vj) holds. In this paper, we assume that all edge weights are symmetric. 
The travel time for a vehicle to move from vertex Vi to vertex vj on G is the 
sum of edge weights belonging to the unique path from Vi to Vj. Each job j € J 
has its release time rj > 0 and handling time hj > 0: That is, a vehicle cannot 
start serving job j before time rj, and it takes hj time units to serve job j (no 
interruption of the service is allowed) . A vehicle at vertex Vj may wait until time 
rj to serve job j, or move to other vertices without serving job j if it is more 
advantageous (in this case, the vehicle must come back to Vj later to serve job 
j, or another vehicle must come to Vj to serve it). An instance of the problem 
PATH-MVSP is denoted by (G(= {V, E)),r,h,w,m). 

A motion schedule of the m vehicles is completely specified by m sequences 
of jobs ttI^I = ■ ■ ■ ,jn}), k = 1,2, . . . ,TO, where is the number of jobs 
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to be served by vehicle k (hence, it holds that = tt), and is its t-th 

\k] 

job; i.e., vehicle k is initially situated at vertex v.[k], starts serving job j[ at 
time maxjO, r .[fc] }, and takes h .[k] time units to serve it. After completing job 

\k] 

j[ , the vehicle immediately moves to u.[fc] along the unique path from v .[k] to 
V .[k], taking travel time of the path (i.e., rc(w .[ic] , u .[fe] ) -I- • • • -I- rc(w , u .[fci) 

or rc(w .[fc] , V .[fc]_ ) -I — • -I- w(u .[ic] , w .[fc])), and serves job after waiting until 

Jl Jl J2 J2 

[fel 

time r .[fc] if necessary, and so on, until it completes the last job jn^ . A schedule is 
J2 

denoted by a set of m sequences of jobs tt = . . . , ttI’”!}. The completion 

time of vehicle k (i.e., the workload of it) is defined as the completion time of 
its last job jn}, which is denoted by C(7rl*l). The objective is to find a tt that 
minimizes the maximum completion time of all the jobs, i.e., 

C'max(Tr) = max (1) 

l<fc<m 

In this paper, we denote by tt* an optimal schedule and by the optimal 

value Cmax{TT*)- 

2.2 Subpath and Subinstance 

Let V{i,j) = {vi,Vi+i, (C V), where i < j. Define G{i,j) = 

E{i, j)) be a subpath of a given path G = {V, E) induced by V {i, j) and E{i,j) = 

I / = 1,. . . ,j - 1} (CE), and J{iJ) = {z, i -b 1, . . . , j} (C J) 

the corresponding subset of jobs to the subpath G{i,j). This definition states 
that G(l,n) = G and J(l,n) = J. 

Next consider the scheduling problem of k (< m) vehicles on G{i,j)- This 
is a subinstance of the original instance {G,r,h,w,m). We denote this subin- 
stance by (G{i,j),r,h,w,k); i.e., scheduling k vehicles on subpath G{i,j) = 
(V{i, j), E(i, j)) of the given path G with release times Vji and handling times 
hji for j' G and with edge weights w{vj',Vji+i) (= w{vjr+i,Vj>)) for 

{vj> ,Vji+i} G E{i,j) (hence, the original instance is denoted by {G,r, h,w,m) 
as well as (G(l, n),r, h, w, m)). 

2.3 Zone Schedule and Gapless Schedule 

For a schedule tt, assume that a vehicle covers a subpath G{i,j) = (V{i,j), E{i, 
j)): That is, all jobs served by the vehicle are on G{i,j) and two jobs z and j 
located at the end vertices of G{i,j) have to be served by it. But, there may be 
some jobs j' (z < j' < j) served by other vehicles. Then, the subpath G{i,j) for 
the vehicle is referred to as its zone. 

A feasible schedule tt using m! vehicles (m! < m) is referred to as a zone 
schedule if any two zones in tt do not intersect and thus there are m' — 1 edges 
which are not traversed by any vehicle. Such an edge that is not traversed by any 
vehicle is called a gap. Moreover, a zone schedule is called a 1-way zone schedule 
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if any vehicle traverses an edge belonging to its zone exactly once (hence each 
vehicle traverses its zone from left to right or from right to left). A schedule tt 
is called gapless if each edge {vj,Vj+i} € E is traversed at least once (from Vj 
to Vj+i or from Vj+i to Vj) by some vehicle. 

A zone schedule does not always attain an optimal value. Consider the fol- 
lowing example: 

Example 1. Let G = (V, E) be a path with m = 2 vehicles and n = 4 jobs, where 
V = {vi,V 2 ,V 3 ,V 4 } and E = {{vj,vj+i} \ j = 1,2,3}. For a positive constant B, 
edge weights are given by w(vi,V 2 ) = w(v 3 ,V 4 ) = B and w{v 2 ,V 3 ) = 0, release 
times are given by ri = V 2 = 0, rs = B and T4 = 2B, and handling times are by 
hi = li 4 = Q and h 2 = = B. □ 

In this example, any zone schedule takes at least 3B time to complete all jobs, 
while an optimal schedule is given by = (ui,r'3) and = (v 2 ,V 4 ) whose 
maximum completion time is 2B. 

3 A 2- Approximation Algorithm for Finding an Optimal 
Gapless Schedule 

In this section, we show that the PATH-MVSP of finding an optimal gapless 
schedule is polynomially approximable within twice the optimal. 

3.1 Notations 

We use the following notations: 

• Tmaxihj) = max rji\ the maximum release time for jobs in 

i<3'<j 

• fmax = Tmaxi^jn): the maximum release time for all the jobs in J. 

3 

hjr: the sum of handling times for jobs in 

j'=i 

• E[ = H{1, n): the sum of handling times for all the jobs in J. 

I-i 

• W{i,j) = E w{vj! ,Vj!j^i): the sum of edge weights for edges in E{i,j)- 

j'=i 

• W = W{l,n): the sum of edge weights for all the edges in E. 

The following lower bounds on the minimum of the maximum completion 
time of a gapless schedule in an instance (G, r, h, w, m) are immediately obtained: 

r I W + H 

LBl = max <rj + hj > , LB2 = , LB5 = max w{vj,Vj+i), (2) 

l<j<n I J TTl 

where LBl is the maximum completion time to serve a single job, LB2 means 
the least time for a vehicle to serve and travel to complete the jobs assigned to 
the vehicle, and LB3 is the maximum time to travel an edge in the path. We 
denote the maximum of these lower bounds by: 

7 = max{Li?l, LB2, LB3}. 



( 3 ) 
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3.2 Auxiliary Path and 7 -Splitting Point Set 

In this subsection, for a given path G = (V,E), we introduce an auxiliary path 
Ga = (Va, Ea) and a set E of points on Ga- 

The auxiliary path Ga = (Va,Ea) is defined as follows: 

Va = {ui,U2,- ■ ■ ,U2n}, Ea = EvUEe, where 

Ev = {{u2j-i,U2j) \ j = l,2,...,n}, Ee = {{u2j,U2j+i) I j = 1,2,. . . ,n - 1}, 

and edge weights WA(ui,Ui+i){= WAiui+i,u^)) for (ui,Ui+i) e Ea are given by 
WA{u2j-i,U2j) = hj for j = l,2,...,n and WA{u2j ,U2j+i) = w{vj,Vj+i) for 
j = 1 , 2 , . . . , n — 1 . It is assumed that m is the left end of Ga and U2n the right 
end of it. We call edges in Ey and in Ee, job-edges and travel-edges, respectively. 
Each job-edge {u2j-i,U2j) G Ey corresponds to vertex Vj € V of a, given path 
G, while each travel-edge (u2j,u2j+i) G Ee to edge {vj,Vj+i} G E of G. Note 
that the total length of Ga is 

2n-l 

WA{Ui,Ui+i) = hi-G w{vi,V 2 ) + /l 2 -I- w{v2,V3) H \- hn = W -\- H. 

i=l 

By viewing Ga as a set of 2n points in a 1 -dimensional space, where 
Ui,U2, ■ ■ U2n are located in this order and the distance between Ui and is 
WA{ui,Ui+i), we consider any point a (not necessarily a vertex of Ga) between 
ui and U2n in the space. 

For any two points a and b on Ga, let S{a,b) denote the segment of Ga 
between a and b, and let d{a,b) denote the length of S{a,b) (i.e., the distance 
between a and b on Ga)- For any edge (ui,Uiyi) G Ea, it is assumed that points 
Ui and Ui+i do not belong to the edge; [ui,Ui+i) denotes edge (ui,Ui+i) with its 
left end point Ui- 

For the 7 defined by Eq. o, a set r of m' — 1 points on Ga is referred to 
as 'y-splitting point set, if after deleting all points of E from Ga, the auxiliary 
path Ga will be divided into m' segments of each length equal to or less than 7; 
i.e., by letting ti,T2, . . . be the points from set E, ordered according to 

their distances from the left end ui of Ga, then it holds that d{rk, Tk+i) = 7 for 
fc = 0, 1, 2, . . . , to' — 2 and d{Tm'-i,Tm') < 7, where we define the to' such that 

(to' — 1)7 < W -\- El < to '7 (4) 

and we set tq = Ui and Tm> = W2n for notational convenience. 

3.3 A 2- Approximation for Optimal Gapless Schedule 

By definition, for any instance (G,r,h,w,m), at most one point belonging in 
E = {ti,T2, ■ ■ ■ can fall on each edge of Ga = (Va,Ea)- By assigning 

the jobs corresponding to job-edges fully contained in segment between Tk-i 
and Tk to the vehicle k, and by assigning the job corresponding to a job-edge 
containing Tk to the vehicle fc or fc -|- 1, we obtain a zone schedule. 
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From Eqs. © and if {W+H)/m < max{maxi<j<ji(rj + /ij), maxi<j<ji_i 
wivj^Vj^i)}, then we reduce the number m of vehicles to m! such that (m! — 
\)^ <W + H < m'7 (where m — m! vehicles serve no jobs). Notice that this does 
not change the lower bound 7. In what follows, we assume that 7 = + H) 

(which implies that m' = m). 

Using 7-splitting point set F, the following algorithm obtains a zone schedule 
ttZONe instance (G, r, h, w, m) of PATH-MVSP. 

Algorithm ZONE(G, r, h, w, m; 

Input: A path G = {V, E), where V = {v\,V2, ■ ■ • , Vn} is its set of n ver- 
tices and E = {{vj,Vj+i} | j = 1 , 2 , . . . , n — 1 } is its set of edges, release 
times rj for j G J, handling times hj for j G J, edge weights w{vj,Vj+i) 
for {vj,Vj+i} G E, and the number of vehicles m. 

Output: A zone schedule tt^^ne ^ 2 . Cmax{T^*g) for 

an optimal gapless schedule tt*. 

Step 1 (Initialization): 

Compute the auxiliary path Ga and its 7-splitting point set F = 

{ri, T2, . . . , Tm-i\ from the given path G; 
a(l) := 1; b{m) := n; /* job 1 is served by the 1-st vehicle while 
job n is served by the m-th vehicle */ 

Step 2 (Allocating segments of Ga to m vehicles): 
for /c = 1, 2, . . . , m — 1 do 

if Tfe G [u2i-i,U2i) for some job-edge (u2i-i,U2i) of Ga then 

•f Af s 'WA{U 2 ^- 1 ,U 2 ^) 

if d(U22-i,Tk) > then 

b{k) := i; a{k+l) := i-|- 1; /* job i is served by the fc-th vehicle 
while job i -I- 1 is served by the {k + l)-th vehicle */ 

else 

b{k) := J — 1; a{k -|- 1) := i 
end; /* if */ 
else 

/* Tk G [u2i,U2i+i) for some travel-edge {u2i,U2z+i) of Ga */ 
b{k) := i; a{k -I- 1) := i -I- 1 /* job i is served by the A:-th vehicle 
while job i -I- 1 is served by the (fc -|- l)-th vehicle */ 
end; /* if */ 
end; /* for */ 

Step 3 (Scheduling m vehicles in each subpath G{a{k),b{k)), k = 

1,2, . . . ,m): 

for /c = 1, 2, . . . , m do 

ttW := {a{k),a{k) -I- 1, ... , b{k))] := {b{k),b{k) — 1, ... , a(fc)); 

Compute the completion times G(7 t[^ 1) and G(^[^l) in G{a{k),b{k))] 
if G(7 t['= 1) > G(/rW) then 
ttW := 
end; /* if */ 
end; /* for */ 

ttZONe 
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Let Cmax{'!^g) be the minimum of the maximum completion time of a gapless 
schedule, i.e., the maximum completion time of an optimal gapless schedule tt* 
in (G, r, h, w, m). Now the lower bound 7 is equal to ■ 

Theorem 1 . For an instance {G,r,h,w,m) 0/ PATH-MVSP, let Cmax{T^*g) be- 
the maximum completion time of an optimal gapless schedule tt*. Then there 
exists a 1 -way zone schedule tt^^ne 

C^ax{ 7 T^°^^) < LBl + LB 2 (< 2 • C^ax{ 7 r;)). ( 5 ) 

Moreover, such a 1 -way zone schedule tt^O^e found in 0 {n) time. 

Proof. It is not difficult to see that algorithm ZONE runs in 0 (n) time and out- 
puts a feasible zone schedule tt^^ne^ Hence it suffices to show that 
< 2 -Cmax{T^l) holds. For this, we show that min{G(7r[^l), G(/r[^l)} < LBl-\-LB 2 
holds for each vehicle k = 1 , 2 , ... ,m. 

We consider the following three cases. 

Case 1 . Tfc_i ^ [M2a(fc)-i,M2a(fe)) and Tk ^ [u2b{k)-i,U2b{k))- Consider the se- 
quence of jobs = {a{k),a{k) -h l,...,b{k)) on subpath G{a{k),b{k)) of 
the given path G. Thus W{a{k),b{k)) -\- H{a{k),b{k)) = d{u2a(k)-iT'u-2b(k)) < 
d{Tk-i,Tk) = 7. By considering a schedule in which the vehicle starts the se- 
quence of jobs after waiting at vertex Ua(fc) until time r-max{a{k),b{k)), we have 

G(^W) < w(a(fc), 6(fc)) + W{a{k), b{k)) + H{a{k),b{k)) 

< rmax{a{k),b{k)) -I- 7 < LBl -\- LB 2 . 

Case 2 . tu-i € [M2a(fc)-i, U2a(fe)) and Tk i [u2b{k)-i,U2b(k)) (the case of Tk-i i 
[U2a(fe)-1, U2a(fc)) and Tk G [u 26 (fc)_ 1 , M 26 (fc) ) ^an be treated symmetrically): In this 
case, Tfc_i is situated on [u2a(fe)-i, U2a(fc)) by d(u2a(k)-iMk-i) < WA{u2a{k)-i, 
U2a(fc))/2 (= ha(k)/‘^). TlmsW{a{k),b{k))+H{a{k),b{k)) = d{u2a(k)-i,U2b(k)) < 
d{u2a(k)-iMk-i) + d{Tk-i,Tk) < ha{k)/‘^ + 7 - Consider the sequence of jobs 
= {a{k),a{k) -|- 1 , . . . , b{k)) on subpath G{a{k),b{k)) of G. If ra(k) + ^a(fe) > 
Tmax{a{k),b{k)), then all jobs in J{a{k),b{k)) \ {a(/c)} do not require for the 
vehicle to wait at their vertices after it completes job a{k), implying 

< r„(fc) + W{a{k),b{k)) + H{a{k), b{k)) < r^^k) + ^ + 7 

, ^(^)) '^a{k) ^a{k) LBl LBl 

< ^ ^ +7<— + — +LB2; 

On the other hand, if ra(k) + ha(k) < ‘>'max{a{k),b{k)), then by considering a 
schedule such that the vehicle waits for rmax{o.{k) ,b{k)) — {ra(k) + ^a(fc)) time 
after serving job a{k), we have 

G( 7 r['"') < Ta{k) + ha{k) + ( ^ max {a{k),b{k)) - {ra(k) + ^o(fc))) 

+W{a{k),b{k)) + {H{a{k),b{k)) - ha(k)) 

^a(k) ( \ 

^ ^a(fc) “1“ ^ h '7 H“ 6(/c)) (j"a{k) ' 1 ~ ^a{k)) J 

< rmax{a{k),b{k)) + 7 < LBl + LB 2 . 
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Case 3. Tk-i G [u 2 a(fe)-i, M 2 o(/c)) and Tk G [w 2 b(fc)-i, M 2 b(fe)): We can assume 
without loss of generality that ha(k) ^ h^k)- In this case, Tk-i is situated on 
[u 2 a(k)-i,U 2 a(k)) &om U 2 a(fc)-i with distance less than WA{u 2 a(k)-i,U 2 aik)) /‘^ (= 
ha{k)/‘^)i and Tk is situated on [u 2 b{k)-iTU 2 b(k)) from U 2 b{k)-i with distance 
greater than WA{u 2 b(k)-i,U 2 b(k)) /‘^ (= hik)/'^)- Then W{a{k),b{k)) + H{a{k), 
b(^k)) = d{u2a{k) — l^'^2b{k)] ~ ^('^2a(fc) — 1 7 — 1 ) “t” d(Tk—ljTk) d{Tk U2b{k)) ^ 
^a{k)/‘^ “t“ ^ “t“ hb(k)l‘^- 

Consider the sequence of jobs = {a{k),a{k) + l,...,b{k)) on subpath 
G{a{k),b{k)) of G. If Ta(k) + K(k) > rmax{a{k),b{k)), then all jobs in J{a{k), 
b{k)) \ {a(fc)} do not require for the vehicle to wait at their vertices after it has 
completed job a{k), indicating 

+ W{a{k),b{k)) + H{a{k),b{k)) < r,^k) + ^ + y + ^ 

< fa{k) + ha{k) + 7 < TSl + LB2; 

Otherwise, if rg^(^k) + ha(k) < fmax{0'{k), b{k)), then by considering a schedule such 
that the vehicle waits for rmax{a{k), b{k)) — {ra(k) + ha(k)) time after serving job 
a{k), we have 

C(7t['=1) < ra(k) + K(k) + ( ^ max (a{k),b{k)) - [ra{k) + /la(fc))) 

+W{a{k),b{k)) + {H{a{k),b{k)) - ha(k)) 

^ fa{k) + (j'max{0'{k), b{k)) — {r a(k) + ^a(fc))^ H + 7 H 

^ ! n,\ U 1 ^a(k) hb(^k) 

< rmax[a(k),b{k)) — + + 7 

< rmax{a{k),b{k)) + 7 < LBl + LB2. 

This case analysis shows that any vehicle has its completion time within twice 
the maximum completion time of the optimal gapless schedule. This completes 
the proof. □ 

4 A 2- Approximation Algorithm for General Schedules 

Unfortunately, the optimal schedule tt* for a problem instance (G, r, /i, w, m) is 
not always a gapless schedule, and hence LB2 (see Eq. Q) cannot be used as a 
lower bound on the minimum of the maximum completion time attained 

by general schedules. Thus, we need to take into account all configurations of 
gaps on G which are possible to be incurred by the optimal schedule. Based on 
this idea, we prove the next result. 

Theorem 2. For an instance {G,r,h,w,m) 0 / PATH-MVSP, let be the 

minimum of the maximum completion time of a schedule. Then there exists a 
1-way zone schedule tt^^ne gf m — 1 gaps such that 

Graax{TT^^^^)<2-G*^,,. (6) 

Moreover, such a 1-way zone schedule can be found in 0{mn^) time. □ 
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The existence of such a 1-way schedule tt^^ne theorem can be shown as 

follows. Let 7T* be an optimal schedule to the problem, which consists of several 
gapless schedules for subinstances of G. By m < n, we can assume without 
loss of generality that each of m vehicles serves at least one job in tt*. Let 
62 , . . . , € E be the gaps in tt*, and let Gi, G 2 , . . . , G^+i be the maximal 

subpaths of G induced by non-gap edges, where the jobs in each Gi is served by a 
gapless schedule 7i'*(z) with rm vehicles. For each gapless schedule 7 t*(j), we see by 
Theorem^that there is a 1-way zone schedule tt^^One m'(< rrii) vehicles which 
serves the jobs in Gi and has the completion time GmaxiT^f^^^) < “2-G maxi'll {i))- 
We can assume m^ = rrii, since Gi contains at least rrii vertices by the assumption 
on TT*, and hence if to' < we can modify into a zone schedule of 

TOj — 1 gaps without increasing the completion time. Since the completion time 
GmaxiT^l) is max{G™aa;( 7 ’‘*(l)), . . . ,Gmaa;( 7 ’‘*fy+ 1))}, the 1 -way zone schedule 
ttZONe consisting of these 1 -way zone schedules tt^One^ ^ ^ ^.^zone ensures the 
existence of a desired 1-way zone schedule in Theorem |2 

Now we show that such a 1-way zone schedule in Theorem El can 

be computed in 0{mn^) time. For this, it suffices to prove that an optimal 1- 
way zone schedule ttOptzone to — 1 gaps can be found in the same time 
complexity. 



Lemma 1. For an instance (G,r,h,w,m) o/PATH-MVSP {where edge weights 
are not necessarily symmetric) , an optimal 1-way zone schedule ti-Optzone 
TO — 1 gaps can be found in 0{mn^) time. 

Proof. We compute ttOptzone following dynamic pro- 

gramming. For 1 < i < j < n, let T{i,j) (resp., T{j,i)) denote the completion 
time of a 1-way zone schedule which serves the jobs in J{i,j) from i to j (resp., 
from j to i). For 1 < f < j < n and 1 < £ < to, let G{i,j,£) denote the maxi- 
mum of the minimum completion times of 1 -way zone schedules in subinstances 
G{i',f) over all possible configurations of exactly £—1 gaps on G{i,j). Note 
that G(l,n,TO) = Hence, G( 1 , 7 i,to) can be computed in in 

0{mn^) time by the following dynamic programming. 



Step 1: for i = 1, 2, . . . , n do 

T{i,i) := n + hi 
end; /* for */ 
for k = 1,2, . . . ,n — 1 do 
for i = 1,2, . . . ,n — k do 
j -.= i + k; 

T{i,j) := max{r(i,j - 1) -b w{vj-i,Vj),rj} -b hf, 
T{j,i) := max{T(j,i -b 1) -b w{vi+i,Vi),ri} -b fy; 
G{i,j,l) ■.= rmn{T{i,j),T{i,i)} 
end; /* for */ 
end; /* for */ 

Step 2: for £ = 2, 3, . . . , to do 

for j = £, £ -b 1, . . . , n do 
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C{1, j, i) ■■= ^^min_^|max{C'(l, f,£- 1), C(j' + 1, j, 1) } | 

end; /* for ~ ~ 
end. /* for */ 



Notice that by backtracking the computation process, we can construct in 
0{mn^) time a 1-way zone scheduling t^Optzone achieves the C(l,n,m). 

□ 

The next example shows that the bound 2 in Theorem 0 is tight. 

Example 2. Let G = (V,E) be a path with m vehicles and n = 3m jobs, 
where V = {v 3 k- 2 ,V 3 k-i, v^k \ k = 1,2, ...,m} and E = {{vj,Vj+i} \ j = 
1,2,..., 3m— 1}. For positive constants B :s> S > e > 0 (and S > 1), edge weights 
are given by w(v 3 k- 2 , vsk-i) = £ and w(v 3 k-i,V 3 k) = e for fc = 1, 2, . . . , m, and 
by w(v 3 k, V 3 (k+i)- 2 ) = for k = 1,2, ... ,m — 1. Release times are given by 
^ 3 fc -2 = (j + e, r 3 k-i = 0, and r 3 k = <5-1- 3e, and handling times are by /i 3 fe _2 = 0, 
h 3 k-i = S and h 3 k = 0 for A: = 1, 2, . . . , m. □ 

In this example, the optimal schedule is tt* = {(2, 1, 3), (5, 4, 6), ... , (3m — 
1, 3m — 2, 3m)} with = <5 -I- 3e = Vmax (note that all the vehicles have the 
same completion time), while an optimal 1-way zone schedule is ttOP'P^ONe _ 
{(1, 2, 3), (4, 5, 6), ... , (3m — 2, 3m — 1, 3m)| with = 26 + ‘ie 

(note that all the vehicles also have the same completion time with respect to 
ttOptzonE)^ Therefore, we have 2 when e ^ 0. 

5 Conclusion 

In this paper, we discussed a scheduling problem of vehicles on a path with 
release and handling times, PATH-MVSP. The problem asks to find an optimal 
schedule of m vehicles serving n jobs that minimizes the maximum completion 
time of all the jobs. The PATH-MVSP is NP-hard for any fixed m > 2, and 
is NP-hard in the strong sense for m arbitrary. Even when there is a single 
vehicle (i.e., the case of m = 1), the problem is NP-hard if the initial location is 
specified. In this paper, for the PATH-MVSP with symmetric edge weights, we 
presented an approximation algorithm which delivers a zone schedule in O(mn^) 
time with the maximum completion time at most twice the optimal. It is left 
open whether or not the PATH-MVSP with asymmetric edge weights admits a 
good approximation algorithm. 
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Abstract. We present a simple shortest path algorithm. If the input 
lengths are positive and uniformly distributed, the algorithm runs in 
linear time. The worst-case running time of the algorithm is 0{m + 
n log C), where n and m are the number of vertices and arcs of the input 
graph, respectively, and C is the ratio of the largest and the smallest 
nonzero arc length. 



1 Introduction 



The shortest path problem with nonnegative arc lengths is very common in prac- 
tice, and algorithms for this problem have been extensively studied, both from 
theoretical, e.g., f2ltil9ll()ll2lldll7l22l28l2til27l29ldlld2| . and computational, e.g., 
1141^11 111 hriiliim . viewpoints. Efficient implementations of Dijkstra’s algo- 
rithm in particular, received a lot of attention. At each step, Dijkstra’s 
algorithm selects for processing a labeled vertex with the smallest distance la- 
bel. For a nonnegative length function, this selection rule guarantees that each 
vertex is scanned at most once. 

Suppose that the input graph has n vertices and m arcs. To state some of the 
previous results, we assume that the input arc lengths are integral. Let U denote 
the biggest arc length. We define C to be the ratio between U and the smallest 
nonzero arc length. Note that if the lengths are integral, then C < U. Modulo 
precision problems and arithmetic operation complexity, our results apply to 
real-valued arc lengths as well. To simplify comparing time bounds with and 
without U (or C), we make the similarity assumption jE]: log 17 = O(logn). 

Several algorithms for the problem have near-linear worst-case running times, 
although no algorithm has a linear running time if the graph is directed and the 
computational model is well-established. In the pointer model of computation, 
the Fibonacci heap data structure of Fredman and Tarjan m leads to an 0(m-|- 
nlogn) implementation of Dijkstra’s algorithm. In a RAM model with word 
operations, the fastest currently known algorithms achieve the following bounds: 
0{m+n{logUloglogU)^/^) f^. 0(m+n(^/loen)) 0(?7rloglog [/) and 
0(m log log n) |23- 

For undirected graphs, Thorup’s algorithm m has a linear running time in 
a word RAM model. A const ant-time priority queue of 0 yields a linear-time 
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algorithm for directed graphs, but only in a non-standard computation model 
that is not supported by any currently existing computers. 

In a recent paper E3E1, Meyer gives a shortest path algorithm with a linear 
average time for input arc lengths drawn independently from a uniform distribu- 
tion on [1, . . . , M]. He also proves that, under the same conditions, the running 
time is linear with high probability. Meyer’s algorithm may scan some vertices 
more than once, and its worst-case time bound, 0(nm log n), is far from linear. 
Both the algorithm and its analysis are complicated. 

In this paper we show that an improvement of the multi-level bucket shortest 
path algorithm of Pj has an average running time that is linear, and a worst-case 
time of 0{m+nlog C). Our average-time bound holds for arc lengths distributed 
uniformly on [1, . . . , M]. We also show that if the arc lengths are independent, the 
algorithm running time is linear with high probability. Both the algorithm and 
its analysis are natural and simple. Our algorithm is not an implementation of 
Dijkstra’s algorithm: a vertex selected for scanning is not necessarily a minimum 
labeled vertex. However, the selected vertex distance label is equal to the correct 
distance, and each vertex is scanned at most once. This relaxation of Dijkstra’s 
algorithm was originally introduced by Dinitz m, used in its full strength by 
Thorup m, and also used in m Our technique can also be used to improve 
the worst-case running time of the above-mentioned 0{m+n{log C/loglog UY^^) 
algorithm of Raman to 0(m -I- n(log Clog log (This new bound also 

applies if the input arc lengths are real-valued.) 

Our results advance understanding of near-linear shortest path algorithms. 
Since many computational studies use graphs with uniform arc lengths, these re- 
sults show that such problems are easy in a certain sense. Although we prove our 
results for the uniform arc length distribution, our algorithm achieves improved 
bounds on some other distributions as well. Our results may have practical im- 
plications in addition to theoretical ones. The multi-level bucket algorithm works 
well in practice 02H and our improvement of this algorithm is natural and easy 
to implement. It is possible that a variant of our algorithm is competitive with 
the current state-of-the-art implementations on all inputs while outperforming 
these implementations on some inputs. Although our algorithm looks attractive, 
the competing implementations are highly optimized and practicality of our re- 
sults cannot be claimed without a careful implementation and experimentation. 

2 Preliminaries 

The input to the shortest path problem we consider is a directed graph G = 
{V, A) with n vertices, m arcs, a source vertex s, and nonnegative arc lengths 
£(o). The goal is to find shortest paths from the source to all vertices of the 
graph. We assume that arc lengths are integers in the interval [1, . . . , U], where 
U denotes the biggest arc length. Let 5 be the smallest nonzero arc length and 
let C be the ratio of the biggest arc length to 5. If all arc lengths are zero or 
if (7 < 2, then the problem can be solved in linear time cni; without loss of 
generality, we assume that C > 2 (and logC >1). This implies logU > 1. We 
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say that a statement holds with high probability (w.h.p.) if the probability that 
the statement is true approaches one as m — >■ oo. 

We assume the word RAM model of computation (see e.g., P). Our data 
structures need array addressing and the following unit-time word operations: 
addition, subtraction, comparison, and arbitrary shifts. To allow a higher-level 
description of our algorithm, we use a strong RAM computation model that also 
allows word operations including bitwise logical operations and the operation 
of finding the index of the most significant bit in which two words differ. The 
latter operation is in AGO; see 0 for a discussion of a closely related operation. 
The use of this more powerful model does not improve the amortized operation 
bounds, but simplifies the description. 

Our shortest path algorithm uses a variant of the multi-level bucket data 
structure of Denardo and Fox [0|. Although we describe our result in the strong 
RAM model, following jOj, one can also follow |t)l2 1 ) and obtain an implemen- 
tation of the algorithm in the weaker word RAM model. Although somewhat 
more complicated to describe formally, the latter implementation appears more 
practical. 



3 Labeling Method and Related Resnlts 

The labeling method for the shortest path problem im works as follows 
(see e.g., m ). The method maintains for every vertex v its distance label d{v), 
parent p{v), and status S{v) € {unreached, labeled, scanned}. Initially d{v) = oo, 
p{v) = nil, and S(v) = unreached. The method starts by setting d{s) = 0 and 
S{s) = labeled. While there are labeled vertices, the method picks such a vertex 
V, scans all arcs out of v, and sets S{v) = scanned. To scan an arc (v,w), one 
checks if d{w) > d{v) +£{v,w) and, if true, sets d{w) = d{v) +i{v,w), p{w) = v, 
and S{w) = labeled. 

If the length function is nonnegative, the labeling method always terminates 
with correct shortest path distances and a shortest path tree. The efficiency of 
the method depends on the rule to chose a vertex to scan next. We say that d{v) 
is exact if the distance from s to is equal to d{v). It is easy to see that if the 
method always selects a vertex v such that, at the selection time, d{v) is exact, 
then each vertex is scanned at most once. 

Dijkstra M observed that if £ is nonnegative and f is a labeled vertex with 
the smallest distance label, than d{v) is exact. However, a linear-time implemen- 
tation of Dijkstra’s algorithm in the strong RAM model appears to be hard. 
Dinitz [m and Thorup m use a relaxation of Dijkstra’s selection rule to get 
linear-time algorithms for special cases of the shortest path problem. To describe 
this relaxation, we define the caliber of a vertex v, c(v), to be the minimum length 
of an arc entering v, or infinity if no arc enters v. The following caliber lemma 
is implicit in 

Lemma 1. Suppose £ is nonnegative and let pi be a lower bound on distance 
labels of labeled vertices. Let v be a vertex such that p + c(v) > d{v). Then d{v) 
is exact. 
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4 Algorithm Description and Correctness 

Our algorithm is based on the multi-level bucket implementation 0 of Dijkstra’s 
algorithm, but we use Lemma Q to detect and scan vertices with exact (but not 
necessarily minimum) distance labels. Our algorithm is a labeling algorithm. 
During the initialization, the algorithm also computes c(v) for every vertex v. 
Our algorithm keeps labeled vertices in one of two places: a set F and a prior- 
ity queue B. The former is implemented to allow constant time additions and 
deletions, for example as a doubly linked list. The latter is implemented using 
multi-level buckets as described below. The priority queue supports operations 
insert, delete, decrease-key, and extract-min. However, the insert oper- 
ation may insert the vertex into B or F, and the decrease-key operation may 
move the vertex from B to F 

At a high level, the algorithm works as follows. Vertices in F have exact 
distance labels and if F is nonempty, we remove and scan a vertex from F. If F 
is empty, we remove and scan a vertex from B with the minimum distance label. 
Suppose a distance label of a vertex u decreases. Note that u cannot belong to 
F. If u belongs to B, then we apply the decrease-key operation to u. This 
operation either relocates u within B or discovers that u’s distance label is exact 
and moves u to F. If u was neither in B nor F, we apply the insert operation 
to u, and u is inserted either into B or, if d{u) is determined to be exact, into 
F. 

The bucket structure B contains fc -I- 1 levels of buckets, where k = [log U~\ . 
Except for the top level, a level contains two buckets. Conceptually, the top level 
contains infinitely many buckets. However, at most three consecutive top-level 
buckets can be nonempty at any given time (Lemma 0 below), and one can 
maintain only these buckets by wrapping around modulo three at the top level. 
(For low-level efficiency, one may want to have wrap around modulo four, which 
is a power of two.) 

We denote bucket j at level i by B{i,j)-, i ranges from 0 (bottom level) to 
k (top), and j ranges from 0 to 1, except at the top level discussed above. A 
bucket contains a set of vertices maintained in a way that allows constant-time 
insertion and deletion, e.g., in a doubly linked list. At each level i, we maintain 
the number of vertices at this level. 

We maintain /i such that ^ is a lower bound on the distance labels of labeled 
vertices. Initially /r = 0. Every time an extract-min operation removes a vertex 
V from B, we set ^ = d{v). Consider the binary representation of the distance 
labels and number bit positions starting from 0 for the least significant bit. Let 
/iij denote the i-th through j-th least significant bit of fi and let jii denote 
the i-th least significant bit. Similarly, di{u) denotes the i-th least significant 
bit of d{u), and likewise for the other definitions. Note that n and the fc -I- 1 
least significant bits of the binary representation of d{u) uniquely determine 
d{u): d{u) = (wo.fc - Mo.fc) if Uo,k > Mo.fc and d{u) = + + (rto,fe - /ro,fc) 

otherwise. 

For a given /r, let and /If be /i with the i least significant bits replaced by 0 
or 1, respectively. Each level i < k corresponds to the range of values [ni+i, /ii-i-i]. 
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Fig. 1. Multi-level bucket example, fc = 3, /r = 10. Values on the bottom are in decimal. 
Values on top are in binary, with the least signihcant bit on the bottom. Shaded bits 
determine positions of the corresponding elements. 



Each bucket corresponds to the subrange containing all integers in the 

range with the *-th bit equal to j. At the top level, a bucket B{k,j) corresponds 
to the range [j • 2^, (j -b 1) • 2^). The width of a bucket at level i is equal to 2*: 
the bucket contains 2® distinct values. We say that a vertex u is in the range of 
B{i,j) if d{u) belongs to the range corresponding to the bucket. 

The position of a vertex u in B depends on /r: u belongs to the lowest-level 
bucket containing d(u). More formally, let i be the index of the most significant 
bit in which d(u) and /ro,fe differ, or 0 if they match. Note that < d(u) < JLl. 
Given fi and u with d{u) > /i, we define the position ofu by (f, di{u)) ifi < k and 
B{k, [d(u) — fi/2^\) otherwise. If u is inserted into B, it is inserted into B{i,j), 
where (i,j) is the position of u. For each vertex in B, we store its position. 

Figure E gives an example of the bucket structure. In this example, fc = 3 
and fj, — 10. For instance, to find the position of a vertex v with d(v) = 14, we 
note that the binary representations of 10 and 14 differ in bit 2 (remember that 
we start counting from 0) and the bit value is 1. Thus v belongs to bucket 1 at 
level 2. 

Our modification of the multi-level bucket algorithm uses Lemma [D during 
the insert operation to put vertices into F whenever the lemma allows it. The 
details are as follows. 

insert; Insert a vertex u into i? U F as follows, li fa + c(u) > d{u), put u into 
F. Otherwise compute it’s position {i,j) in B and add u to B(i,j). 

decrease-key; Decrease the key of an element u in position (i,j) as follows. 
Remove u from B{i,j). Set d{u) to the new value and insert u as described 
above. 

extract-min; Find the lowest nonempty level i. Find j, the first nonempty 
bucket at level i.lfi = 0, delete a vertex u from B{i,j). (In this case fi = d{u).) 
Return u.lfi > 0, examine all elements of B{i,j) and delete a minimum element 
u from B{i,j). Note that in this case fi < d{u); set fi = d{u). Since fi increased. 
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some vertex positions in B may have changed. We do bucket expansion of B(i,j) 
and return u. 

To understand bucket expansion, note that the vertices with changed posi- 
tions are exactly those in B{i,j). To see this, let p,' be the old value of fi and 
consider a vertex v in B. Let (i',j') be v’s position with respect to p! . By the 
choice of B{i,j), if (i, j) ^ {i',j'), then either i < i' , or i = i' and j < j' . In 
both cases, the common prefix of p.' and d{v) is the same as the common prefix 
of d{u) and d{v), and the position of v does not change. 

On the other hand, vertices in B(i,j) have a longer common prefix with d(u) 
than they have with p' and these vertices need to move to a lower level. Bucket 
expansion deletes these vertices from B(i) and uses the insert operation to add 
the vertices back into B or into F, as appropriate. 

Although the formal description of the algorithm is a little complicated, the 
algorithm itself is relatively simple: At each step, remove a vertex from F; or, if 
F is empty, then remove the minimum-labeled vertex from B. In the latter case, 
expand the bucket from which the vertex has been removed, if necessary. Scan 
the vertex and update its neighbors if necessary. Terminate when both F and B 
are empty. 

Note that we do bucket expansions only when F is empty and the expanded 
bucket contains a labeled vertex with the minimum distance. Thus p is updated 
correctly. 

In the original multi-level bucket algorithm, at any point of the execution all 
labeled vertices are contained in at most two consecutive top level buckets. A 
slightly weaker result holds for our algorithm. 

Lemma 2. At any point of the execution, all labeled vertices are in the range of 
at most three consecutive top level buckets. 

Proof. Let p' be the current value of p and let B{k,j) be the top level bucket 
containing p'. Except for s (for which the result holds trivially), a vertex v 
becomes labeled during a scan of another vertex u removed from either B or F. In 
the former case, at the time of the scan d{u) = p < p' , d{v) = p+i{u, v) < p'+2^, 
and therefore v is contained either in B{k,j) or B{k,j + I). In the latter case, 
when u has been added to F, the difference between d{u) and p was at most 
c{u) < 2'^, thus d{u) < p' + 2^, d{v) < d{u) -1-2^ < ^' -|- 2 • 2^, and thus v belongs 
to B{k,j), B{k,j + 1), or B{k,j + 2). 

Algorithm correctness follows from Lemmas Q] and and the observations 
that p is always set to a minimum distance label of a labeled vertex, p remains 
a lower bound on the labeled vertex labels (and therefore is monotonically non- 
decreasing), and F always contains vertices with exact distance labels. 

5 Worst-Case Analysis 

In this section we prove a worst-case bound on the running time of the algorithm. 
Some definitions and lemmas introduced in this section will be also used in the 
next section. 
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We start the analysis with the following lemmas. 

Lemma 3. jSI 

— Given /i and u, we can compute the position ofu with respect to pi in constant 
time. 

— We can find the lowest nonempty level of B in 0(1) time. 



Lemma 4. The algorithm runs in 0(m + n + T>) time, where is the total 
number of times a vertex moves from a bucket of B to a lower level bucket. 

Proof. Since each vertex is scanned at most once, the total scan time is 0(m + 
n). A vertex is added to and deleted from F at most once, so the total time 
devoted to maintaining F is 0(n). An insert operation takes constant time, 
and these operations are caused by inserting vertices into B for the first time, 
by decrease-key operations, and by extract-min operations. The former take 
0(n) time; we account for the remaining ones jointly with the other operations. 
A decrease-key operation takes constant time and is caused by a decrease of 
d(v) due to a scan of an arc (u, v) . Since an arc is scanned at most once, these 
operations take 0(m) total time. The work we accounted for so far is linear. 

Next we consider the extract-min operations. Consider an extract-min 
operation that returns u. The operation takes 0(1) time plus an amount of time 
proportional to the number of vertices in the expanded bucket, excluding u. 
Each of these vertices moves to a lower level in B. Thus we get the desired time 
bound. 

The 0{m + nlog U) worst-case time bound is easy to see. To show a better 
bound, we define k' = [log . 

Lemma 5. Buckets at level k' and below are never used. 

Proof. Let (i,j) be the position of a vertex v of caliber c{v) > S. If i < k' , then 
d{v) — ^<2® <2^ ^ d < c(v) and the algorithm adds v to F, not B. 

The above lemma implies the following bound. 

Theorem 1. The worst-case running time of the algorithm is 0(m-\- nlogC). 

Note that the lemma also implies that the algorithm needs only 0(log C) words 
of memory to implement the bucket data structure. 

Our optimization can also be used to improve other data structures based 
on multi-level buckets, such as radix heaps | 2 | and hot queues | 0 |. For these 
data structures, the equivalent of Lemma El allows one to replace time bound 
parameter U by C. In particular, the bound of the hot queue implementation 
of Raman m improves to 0{m n(log Clog log C)^/^). The modification of 
Raman’s algorithm to obtain this bound is straightforward given the results of 
the current section. 
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6 Average-Case Analysis 

In this section we prove linear-time average and high-probability time bounds 
for our algorithm under the assumption that the input arc lengths are uniformly 
distributed on [1, . . . , M]0 

A key lemma for our analysis is as follows. 

Lemma 6. The algorithm never inserts a vertex v into a bucket at a level less 
than or equal to logc(u) — 1. 

Proof. Suppose during an insert operation u’s position in B is (i,j) with i < 
logc(u) — 1. Then the most significant bit in which d(v) and /r differ is bit i and 
d{v) — n < < c{v). Therefore insert puts v into F, not B. 

The above lemma motivates the following definitions. The weight of an arc 
a, w{a), is defined by w{a) = k — [log£(a)J. The weight of a vertex v, w{v), 
is defined to be the maximum weight of an incoming arc or zero if v has no 
incoming arcs. Lemma El implies that the number of times v can move to a lower 
level of B is at most w{v) + 1 and therefore F < m + '^yw{v). Note that k 
depends on the input, the weights are defined with respect to a given input. 

For the probability distribution of arc weights defined above, we have 
Pr[[log£(a)J = i] = 2®/M for z = 0, . . . , fc — 1. The definition of w yields 



Pr['u;(a) = t] = 2^ */M forf=l,. 


. . ,fc. 


(1) 


Since M >U, we have M > 2^~^, and therefore 

Pr[w(a) = f] < 2"*+i fort = l,.. 


.,k. 


(2) 



Theorem 2. If arc lengths are uniformly distributed on then the 

average running time of the algorithm is linear. 

Proof. Since < m + J2v it is enough to show that w(u)] = 0{m). 

By the linearity of expectation and the definition of w{v), we have rti(u)] < 

E[w(a)]. The expected value of w(a) is 

k oo oo 

E[ic(a)] = ^iPr['u;(a) = i] < ^ = 2^i2“® = 0(1). 

i — 1 2 = 1 2=1 

Note that this bound holds for any k. Thus E[w(a)] = 0{m). 

Remark The proof of the theorem works for any arc length distribution such 
that E[w(a)] = 0(1). In particular, the theorem holds for real- valued arc lengths 
selected independently and uniformly from [0, 1]. In fact, for this distribution the 



^ As we shall see, if M is large enough then the result also applies to the range 

[0,...,M]. 
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high-probability analysis below is simpler. However, the integer distribution is 
somewhat more interesting, for example because some test problem generators 
use this distribution. 

Remark Note that Theorem |21 does not require arc lengths to be indepen- 
dent. Our proof of its high-probability variant. Theorem |3 requires the indepen- 
dence. 

Next we show that the algorithm running time is linear w.h.p. by showing 
that w{a) = 0{m) w.h.p. First, we show that w.h.p. U is not much smaller 
than M and 5 is close to Mm~^ (Lemmas [0 and EJ. Let St be the set of all arcs 
of weight t and note that w{a) = We show that as t increases, the 

expected value of \St \ goes down exponentially. For small values of t, this is also 
true w.h.p. To deal with large values of t, we show that the total number of arcs 
with large weights is small, and so is the contribution of these arcs to the sum 
of arc weights. 

Because of the space limit, we omit proofs of the following two lemmas. 

Lemma 7 . W.h.p., U > M/2. 

Lemma 8. W.h.p., 6 > . If M > then w.h.p. 5 < 

From m and the independence of arc weights, we have E[|5't|] = ru2^ */M. 

By the Chernoff bound (see e.g. [7^25) ). Pr[|S'd > 2m2^~*/M] < (f)™^ 

Since M > 2^~^, we have 

/ g \ 2m2“* 

Pr[|5*|>4m2-‘] < (-) 

As mentioned above, we bound the contributions of arcs with large and small 
weights to w{a) differently. We define /3 = \og{rn?/^) and partition A into 
two sets, Ai containing the arcs with w(a) < (3 and A^ containing the arcs with 
w{a) > (3. 

Lemma 9 . w(a) = 0{m) w.h.p. 

Proof. Assume that S > and U > M/2; by Lemmas Q and 0 this 

happens w.h.p. This assumption implies C < . The probability that for 

some t \ 1 < t < (3, |5't| > 4m2“* is, by the union bound and the fact that the 
probability is maximized for t = (3, less than 

/3 (^-j < log(m^/^) (^-j <logm(^-j — t> 0 as m — >■ oo. 

Thus w.h.p., for alH : 1 < t < /3, we have \St \ < 4m2“‘ and 

t</3 oo 

= 0{m). 

Ai 
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Lemma 10. J2a2 = 0{m) w.h.p. 

Proof. If M < rnf!^ ^ then k < (3 and A 2 is empty, so the lemma holds trivially. 

Now consider the case M > By Lemmas 0 and 0 w.h.p. < 

S < and P > M/2; assume that this is the case. The assumption implies 

j2 < C < vrAI"^. Under this assumption, we also have 2^“^ < M < 2*^+^. 
Combining this with (0) we get 2“^“‘ < Pr[?ii(a) =t]< 2^“*. This implies that 

2-2-/3 < pr[^(;(a) > [3]< 



therefore 



m 

8 



< Pr[w(a) > (3]< 4m 



and by the independence of arc weights. 






8 



< E[|A2|] < 4m^/3 



By the Chernoff bound, 

Pr[|A2| > 2E[|A2|]] < (0 



e\E[IM|] 



Replacing the first occurrence of E[|A 2 |] by the upper bound on its value and 
the second occurrence by the lower bound (since e/4 < 1), we get 



Pr 



. 1 / e \ 

IA2I > 8 m^'^ ^ V 4 / ^ as m — >■ 00. 



For all arcs a, ^(a) > S, and thus 

w{a) = k — [f'(a)J < 1 + log U + 1 — log 5 = 2 + log C < 2 + (4/3) log m. 
Therefore w.h.p., 

^rc(a)<8m2/3(2 + (4/3)log m) = o(m). 

A2 



Thus we have the following theorem. 

Theorem 3. If arc lengths are independent and uniformly distributed on 
[1, . . . ,M], then with high probability, the algorithm runs in linear time. 

Remark The expected and high probability bounds also apply if the arc 
lengths come from [0, . . . , U] and U = w(m), as in this case with high probability 
no arc has zero length. 
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7 Concluding Remarks 

We described our algorithm for the binary multi-level buckets, with two buckets 
at each level except for the top level. One can easily extend the algorithm for 
base-Z\ buckets, for any integer Z\ > 2. One gets a worst-case bound for A = 
H when the work of moving vertices to lower levels balances with 
the work of scanning empty buckets during bucket expansion. Our average-case 
analysis reduces the former but not the latter. We get a linear running time 
when A is constant and the empty bucket scans can be charged to vertices in 
nonempty buckets. An interesting open question is if one can get a linear average 
running time and a better worst-case running time, for example using techniques 
from EEE], without running several algorithms “in parallel.” 

Our optimization is to detect vertices with exact distance labels before these 
vertices reach the bottom level of buckets and place them into F. This technique 
can be used not only in the context of multi-level buckets, but in the context of 
radix heaps |2j and hot queues . 

Acknowledgments. The author would like to thank Jim Horning, Anna Karlin, 
Rajeev Raman, Bob Tarjan, and Eva Tardos for useful discussion and comments 
on a draft of this paper. We are also greatful to an anonymous referee for pointing 
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Abstract. We consider the single-source many-targets shortest-path 
(SSMTSP) problem in directed graphs with non-negative edge weights. 

A source node s and a target set T is specified and the goal is to com- 
pute a shortest path from s to a node in T. Our interest in the shortest 
path problem with many targets stems from its use in weighted bipartite 
matching algorithms. A weighted bipartite matching in a graph with n 
nodes on each side reduces to n SSMTSP problems, where the number 
of targets varies between n and 1. 

The SSMTSP problem can be solved by Dijkstra’s algorithm. We de- 
scribe a heuristic that leads to a significant improvement in running 
time for the weighted matching problem; in our experiments a speed-up 
by up to a factor of 10 was achieved. We also present a partial analysis 
that gives some theoretical support for our experimental findings. 

1 Introduction and Statement of Results 

A matching in a graph is a subset of the edges no two of which share an endpoint. 
The weighted bipartite matching problem asks for the computation of a maxi- 
mum weight matching in an edge-weighted bipartite graph G = {A(j B,E,w) 
where the cost function w : E ^ JR assigns a real weight to every edge. The 
weight of a matching M is simply the sum of the weights of the edges in the 
matching, i.e., w{M) = either ask for a perfect matching 

of maximal weight (the weighted perfect matching problem or the assignment 
problem) or simply for a matching of maximal weight. Both versions of the prob- 
lem can be solved by solving n, n = max(|A|, |i?|), single-source many-targets 
shortest-path (SSMTSP) problems in a derived graph, see Sec. 4. We describe 
and analyse a heuristic improvement for the SSMTSP problem which leads to 
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a significant speed-up in LEDA’s weighted bipartite matching implementation, 
see Tab. 3. 

In the SSMTSP problem we are given a directed graph G = {V, E) whose 
edges carry a non-negative cost. We use cost{e) to denote the cost of an edge 
e £ E. We are also given a source node s. Every node in V is designated as 
either free or non-free. We are interested in finding the shortest path from s to 
a free node. 

The SSMTSP problem is easily solved by Dijkstra’s algorithm. Dijkstra’s al- 
gorithm (see Sec. 2) maintains a tentative distance for each node and a partition 
of the nodes into settled and unsettled. At the beginning all nodes are unset- 
tled. The algorithm operates in phases. In each phase, the unsettled node with 
smallest tentative distance is declared settled and its outgoing edges are relaxed 
in order to improve tentative distances of other unsettled nodes. The unsettled 
nodes are kept in a priority queue. The algorithm can be stopped once the first 
free node becomes settled. 

We describe a heuristic improvement. The improvement maintains an up- 
per bound for the tentative distance of free nodes and performs only queue 
operations with values smaller than the bound. All other queue operations are 
suppressed. The heuristic significantly reduces the number of queue operations 
and the running time of the bipartite matching algorithm, see Tab. 2 and Tab. 3. 

This paper is structured as follows. In Sec. 2 we discuss Dijkstra’s algorithm 
for many targets and describe our heuristic. In Sec. 3 we give an analysis of the 
heuristic for random graphs and report about experiments on random graphs. 
In Sec. 4 we discuss the application to weighted bipartite matching algorithms 
and present our experimental findings for the matching problem. 

The heuristic was first used by the second author in his jump-start routine 
for the general weighted matching algorithm [6,5]. When applied to bipartite 
graphs, the jump-start routine computes a maximum weight matching. When 
we compared the running time of the jump-start routine with LEDA’s bipartite 
matching code [4, Sec. 7.8], we found that the jump-start routine is consistently 
faster. We traced the superiority to the heuristic described in this paper. 



2 Dijkstra’s Algorithm with Many Targets 

It is useful to introduce some more notation. For a node v £ V, let d{v) be the 
shortest path distance from s to v, and let do = min{c?(w) ; v is free}. If there 
is no free node reachable from s, do = -l-oo. Our goal is to compute (1) a node 
Vo with d(vo) = do (or an indication that there is no such node), (2) the subset 
V' of nodes with d(v) < do, more precisely, v £ V ii d{v) < do and d{v) > do 
if V ^ V , and (3) the value d{v) for every node v £ {wq} U V', i.e., a partial 
function d with d{v) = d{v) for any v £ {t'o} U V . (Observe that nodes v with 
d{v) = do may or may not be in V'.) We refer to the problem just described 
as the single-source many-targets shortest-path (SSMTSP) problem. It is easily 
solved by an adapted version of Dijkstra’s algorithm as shown in Fig. 1. 
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Dijkstra’s Algorithm (adapted Version): 
dist{s) — 0 and dist{u) = +oo for nil u £ V,u ^ s 

PQ.insert{s,0) (insert (s, 0) into PQ) 

while not PQ. empty {) do 

u = PQ.dePminQ (remove node u from PQ with minimal priority) 

if u is free then STOP fi 

RELAX ALL OUTGOING EDGES OF U 

od 

RELAX ALL OUTGOING EDGES OF U 
forall e = (u,v) £ E do 
c = dist(u) + cost(e) 
if c < distiv) then 
if dist{v) = +00 
then PQ. insert (v,c) 
else PQ.decreasejp{v, c) 
fi 

dist{v) = c 

fi 

od 

Fig. 1. Dijkstra’s algorithm adapted for many targets. When the first free node is 
removed from the queue, the algorithm is stopped: vq is the node removed last and V' 
consists of all non-free nodes removed from the queue. 



(v is not contained in PQ) 
(insert (v, c) into PQ) 
(decrease priority of v in PQ to c) 



We maintain a priority queue PQ for the nodes of G. The queue is empty 
initially. For each node u £ V we compute a tentative distance dist{u) of a 
shortest path from s to u. Initially, we set dist{s) to zero and put the item (s, 0) 
into the priority queue. For each u £ V,u ^ s, we set dist{u) to +oo (no path 
from s to M has been encountered yet). In the main loop, we delete a node u with 
minimum dist-vahie from the priority queue. If u is free, we are done: vq = u 
and V' is the set of nodes removed in preceding iterations. Otherwise, we relax 
all edges out of u. Consider an edge e = {u,v) and let c = dist{u) + cost{e). 
We check whether c is smaller than the current tentative distance of v. If so, we 
distinguish two cases. (I) If e is the first edge into v that is relaxed (this is the 
case iff dist{v) equals +oo) we insert an item {v, c) into PQ. (2) Otherwise, we 
decrease the priority of v in PQ to c. If a queue operation is performed, we also 
update dist{v). 

We next describe a heuristic improvement of the scheme above. Let B be 
the smallest dist-vahie of a free node encountered by the algorithm; B = +oo 
initially. We claim that queue operations PQ.op{-,c) with c> B may be skipped 
without affecting correctness. This is clear, since the algorithm stops when the 
first free node is removed from the queue and since the dist-vahie of this node 
is certainly at least as small as B. Thus all dist-values less than d{vo) will 
be computed correctly. The modified algorithm may output a different node 
Vo and a different set V . However, if all distances are pairwise distinct the 
same node vq and the same set V' as in the basic algorithm are computed. 
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Dijkstra’s Algorithm with Pruning Heuristic: 
dist{s) — 0 and dist{u) = +oo for all u € V,u ^ s 

B = +00 (initialize upper bound to +oo) 

PQ.insert{s,0) (insert (s,0) into PQ) 

while not PQ. empty {) do 

u = PQ.deljminQj (remove node u from PQ with minimal priority) 

if u is free then STOP fi 

RELAX ALL OUTGOING EDGES OF U 

od 

RELAX ALL OUTGOING EDGES OF U: 
forall e = (u,v) (z E do 
c = dist{u) + cost{e) 
if c> B then continne fi 
if V is free then B = min{c, B} fi 
if c < distiv) then 
if dist{v) = +00 
then PQ. insert (v,c) 
else PQ.decreasejp{v, c) 
fi 

dist{v) = c 

fi 

od 

Fig. 2. Dijkstra’s algorithm for many targets with a pruning heuristic. An upper bound 
B for d{vo) is maintained and queue operations PQ.op{-,c) with c > B are not per- 
formed. 



(prune edge if bound is exceeded) 
(try to improve bound) 

(v is not contained in PQ) 
(insert (v, c) into PQ) 
(decrease priority of v in PQ to c) 



The pruning heuristic can conceivably save on queue operations, since fewer 
insert and decrease priority operations may be performed. Figure 2 shows the 
algorithm with the heuristic added. 



3 Analysis 

We perform a partial analysis of the basic and the modified version of Dijkstra’s 
algorithm for many targets. We use n for the number of nodes, m for the expected 
number of edges and / for the expected number of free nodes. We assume that 
our graphs are random graphs in the B(n,p) model with p = mjri^, i.e., each 
of the possible edges is picked independently and uniformly at random with 
probability p. We use c to denote pn = min. We also assume that a node is free 
with probability q = f /n and that edge costs are random reals between 0 and 
1. We could alternatively use the model in which all graphs with m edges are 
equally likely and in which the free nodes form a random subset of / nodes. The 
results would be similar. We are mainly interested in the case, where p = c/n for 
a small constant c, say 2 < c < 10, and q a constant, i.e., the expected number 
of free nodes is a fixed fraction of the nodes. 
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Deletions from the Queue: We first analyze the number of nodes removed from 
the queue. If our graph were infinite and all nodes were reachable from s, the 
expected number would be l/g, namely the expected number of trials until the 
first head occurs in a sequence of coin tosses with success probability q. However, 
our graph is finite (not really a serious difference if n is large) and only a subset 
of the nodes is reachable from s. Observe, that the probability that s has no 
outgoing edge is (1 — p)” « e“'^. This probability is non-negligible. We proceed 
in two steps. We first analyze the number of nodes removed from the queue given 
the number R of nodes reachable from s and in a second step review results about 
the number R of reachable nodes. 

Lemma 1. Let R he the number of nodes reachable from s in G and let T 
he the number of iterations, i.e., in iteration T the first free node is removed 
from the queue or there is no free node reachable from s and T = R. Then, 
Pr {T = t\R= r) = (1 — q)*~^q, for I < t < r, and Pr {T = t\R = r) = 
(1 — qY~^ , for t = r. Moreover, for the expected number of iterations we have: 
P[T\R=r] = l/q-{l-qY/q. 

The proof is given in Appendix A. 

The preceding Lemma gives us information about the number of deletions 
from the queue. The expected number of edges relaxed is cE [{T — 1) | i? = r] 
since T — 1 non-free nodes are removed from the queue and since the expected 
out-degree of every node is c = m/n. We conclude that the number of edges 
relaxed is about ((l/<?) — l)(m/n). 

Now, how many nodes are reachable from s? This quantity is analyzed in 
[2, pages 149-155]. Let a > 0 be such that a = 1 — exp(-ca), and let R be 
the number of nodes reachable from s. Then R is bounded by a constant with 
probability about 1 — a and is approximately an with probability about a. More 
precisely, for every e > 0 and i5 > 0, there is a to such that for all sufficiently 
large n, we have 



1 — a — 2e < Pr (R < to) < 1 — a -I- e 



and 



a — 2e < Pr ((1 — (5)an < i? < (1 -I- (5)an) < a + ie. 

Table 1 indicates that small values of e and 5 work even for moderate n. For 
c = 2, we have a « 0.79681. We generated 10000 graphs with n = 1000 nodes 
and 2000 edges and determined the number of nodes reachable from a given 
source node s. This number was either smaller than 15 or larger than 714. The 
latter case occurred in 7958 « a ■ 10000 trials. Moreover, the average number of 
nodes reachable from s in the latter case was 796.5 « a ■ 1000 = an. 

For the sequel we concentrate on the case that (1 — S)an nodes are reachable 
from s. In this situation, the probability that all reachable nodes are removed 
from the queue is about 



(1 — (7)“" = exp(o;nln(l — q)) « exp(— ang) = exp(— a/) . 
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Table 1. For all experiments (except the one in the last column) we used random 
graphs with n = 1000 nodes and m = cn edges. For the last column we chose n = 2000 
in order to illustrate that the dependency on n is weak. Nodes were free with probability 
q. The following quantities are shown; for each value of q and c we performed 10^ trials. 

a: the solution of the equation a = 1 — exp(— ca). 

MS: the maximal number of nodes reachable from s when few nodes are reachable. 

ML: the minimal number of nodes reachable from s when many nodes are reachable. 

R: the average number of nodes reachable from s when many nodes are reachable. 

F: the number of times many nodes are reachable from s. 



c 


2 


5 


8 


8 


a 


0.7968 


0.993 


0.9997 


0.9997 


MS 


15 


2 


1 


1 


ML 


714 


981 


996 


1995 


R 


796.5 


993 


999.7 


1999.3 


F 


7958 


9931 


9997 


9995 



This is less than 1/n^, if c > 2 and / > 4 In n/ an assumption which we are going 
to make. We use the phrase “R and / are large” to refer to this assumption. 



Insertions into the Queue: We next analyze the number of insertions into the 
queue, first for the standard scheme. 



Lemma 2. Let IS he the number of insertions into the queue in the standard 
scheme. Then E [IS \ T = t] = n — {n — 1)(1 — pY~^ and 



E [IS I R and f are large] 






The proof is given in Appendix B. 

Observe that the standard scheme makes about c/q insertions into but only 
l/q removals from the queue. This is where the refined scheme saves. Let INRS 
be the number of nodes which are inserted into the queue but never removed in 
the standard scheme. Then, by the above. 



E [INRS I R and / are large] 



c+ 1 - 

q 



1 

q 




The standard scheme also performs some decrease-p operations on the nodes 
inserted but never removed. This number is small since the average number of 
incoming edges scanned per node is small. 

We turn to the refined scheme. We have three kinds of savings. 

— Nodes that are removed from the queue may incur fewer queue operations 
because they are inserted later or because some distance decreases do not 
lead to a queue operation. This saving is small since the number of distance 
decreases is small (recall that only few incoming edges per node are scanned) 

^ For c > 2, we have a > 1/2 and thus exp(— a/) < exp(— |/). Choosing / > 41nn, 
we obtain: exp(— a/) < 1/n^. 
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— Nodes that are never removed from the queue in the standard scheme are not 
inserted in the refined scheme. This saving is significant and we will estimate 
it below. 

— Nodes that are never removed from the queue in the standard scheme are 
inserted in the refined scheme but fewer decreases of their distance labels 
lead to a queue operation. This saving is small for the same reason as in the 
first item. 



We concentrate on the set of nodes that are inserted into but never removed 
from the queue in the standard scheme. How many of these INRS insertions are 
also performed in the refined scheme? We use INRR to denote their number. We 
compute the expectation of INRR conditioned on the event if;, / € N, that in 
the standard scheme there are exactly I nodes which are inserted into the queue 
but not removed. 

Let 6i = (ui,vi), . . . , ei = {ui,V[) he the edges whose relaxations lead to the 
insertions of nodes that are not removed, labeled in the order of their relaxations. 
Then, d{ui) < d{ui+i), 1 < i < I — 1, since nodes are removed from the queue 
in non-decreasing order of their distance values. 

Node Vi is inserted with value d{ui) + w{ei); d{ui) + w{ei) is a random number 
in the interval [d{t),d{ui) + 1], where t is the target node closest to s, since the 
fact that Vi is never removed from the queue implies d{ui) + w(ei) > d(t) but 
reveals nothing else about the value of d(ui) + w(ei). 

In the refined scheme leads to an insertion only if d(ui) + w(ei) is smaller 
than d(uj) + iv(ej) for every free Vj with j < i. The probability for this event is 
at most l/(fc-|- 1), where k is the number of free Vj preceding Vi. The probability 
would be exactly l/(fc -I- 1) if the values d{uh) + w{eh), 1 < ^ were all 
contained in the same interval. Since the upper bound of the interval containing 
d{uh) + w{eh) increases with h, the probability is at most l/{k + 1). 

Thus (the expectation is conditioned on the event E[) 



E [INRR I El] < E E 

l<i</ 0<k<i 







1 

fc -I- 1 



In Appendix C, we show that 

E [INRR [Ei]<^-{1 + ln{lq)) . 



Since h[i(lq) is a convex function of I (its first derivative is positive and its 
second derivative is negative), we obtain an upper bound on the expectation of 
INRR conditioned on R and / being large, if we replace INRS by its expectation. 
We obtain 



E [INRR I R and / are large] < 



1 

<? 



• (1 -I- ln(( 7 E [INRS [ R and / are large])) 



« - • -I- In ^q'^ ^ ^ ^ = - . (1 -I- ln(c - 1)) . 
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Table 2. For all experiments (except the one in the last column) we used random 
graphs with n = 1000 nodes and m = cn edges. For the last column we chose n = 2000 
in order to illustrate that the dependency on n is weak. Nodes were free with probability 
q. The following quantities are shown; for each value of q and c we performed 10^ trials. 
Trials where only a small number of nodes were reachable from s were ignored, i.e., 
about (1 — a)n trials were ignored. 

D) the number of deletions from the queue. 

D* = Ijq: the predicted number of deletions from the queue. 

IS) the number of insertions into the queue in the standard scheme. 

IS* = predicted number of insertions into the queue. 

INKS) the number of nodes inserted but never removed. 

INRS* = IS* — D* : the predicted number. 

INRR) the number of extra nodes inserted by the refined scheme. 

INRR* — ^ • (1 + ln{qN*)): the predicted number. 

DPs) the number of decrease priority operations in the standard scheme. 

DPr) the number of decrease priority operations in the refined scheme. 

Qs) the total number of queue operations in the standard scheme. 

Qr) the total number of queue operations in the refined scheme. 

S = Qs — Qr) the number of saved queue operations. 

S*) the lower bound on the number of saved queue operations. 

P = S/Qs) the percentage of queue operations saved. 



c 

<1 


2 

0.02 


2 

0.06 


2 

0.18 


5 

0.02 


5 

0.06 


5 

0.18 


8 

0.02 


8 

0.06 


8 

0.18 


8 

0.18 


D 


49.60 


16.40 


5.51 


49.33 


16.72 


5.50 


50.22 


16.79 


5.61 


5.53 


D* 


50.00 


16.67 


5.56 


50.00 


16.67 


5.56 


50.00 


16.67 


5.56 


5.56 


IS 


90.01 


31.40 


10.41 


195.20 


73.71 


22.98 


281.30 


112.90 


36.45 


36.52 


IS* 


90.16 


31.35 


10.02 


197.60 


73.57 


23.25 


282.30 


112.30 


36.13 


36.77 


INRS 


40.41 


15.00 


4.89 


145.80 


56.99 


17.49 


231.00 


96.07 


30.85 


30.99 


INRS* 


40.16 


14.68 


4.46 


147.60 


56.90 


17.69 


232.30 


95.60 


30.57 


31.22 


INRR 


11.00 


4.00 


1.00 


35.00 


12.00 


4.00 


51.00 


18.00 


5.00 


5.00 


INRR* 


39.05 


14.56 


4.34 


104.10 


37.13 


11.99 


126.80 


45.78 


15.03 


15.15 


DPs 


1.42 


0.19 


0.02 


13.78 


1.90 


0.19 


36.55 


5.28 


0.56 


0.28 


DPr 


0.71 


0.09 


0.01 


2.63 


0.31 


0.03 


4.60 


0.50 


0.05 


0.03 


Qs 


140.00 


46.98 


14.94 


257.30 


91.33 


27.67 


367.00 


133.90 


41.62 


41.34 


Qr 


110.40 


36.12 


11.52 


134.50 


45.33 


13.97 


154.40 


50.85 


16.00 


15.77 


S 


29.58 


10.86 


3.42 


122.80 


46.00 


13.69 


212.70 


83.08 


25.62 


25.57 


S* 


1.12 


0.13 


0.12 


43.47 


19.77 


5.70 


105.50 


49.82 


15.54 


16.07 


p 


21.12 


23.11 


22.87 


47.74 


50.37 


49.50 


57.94 


62.03 


61.55 


61.85 



We can now finally lower bound the number S of queue operations saved. By 
the above the saving is at least INRS — INRR. Thus 

E [S' I i? and / are large] > - — ^ — -(1 + ln(c — 1)) « - f 1 — ^ 

q q q\ c 

We have a guaranteed saving of . Moreover, if ^ < 1 we are guaranteed 
to save a constant fraction of the queue operations. For example, if c = 8, we will 
save at least a fraction of 1 — « 0.49 of the queue operations. The actual 
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savings are higher, see Tab. 2. Also, there are substantial savings, even if the 
assumption of R and / being large does not hold (e.g., for c = 2 and q = 0.02). 

It is interesting to observe how our randomness assumptions were used in the 
argument above. G is a random graph and hence the number of nodes reachable 
from s is either bounded or very large. Also, the expected number of nodes 
reached after t removals from the queue has a simple formula. The fact that a 
node is free with fixed probability gives us the distribution of the number of 
deletions from the queue. In order to estimate the savings resulting from the 
refined scheme we use that every node has the same chance of being free and 
that edge weights are random. For this part of the argument we do not need 
that our graph is random. 



4 Bipartite Matching Problems 

Both versions of the weighted bipartite matching problem, i.e., the assignment 
problem and the maximum weight matching problem, can be reduced to solv- 
ing n, n = max(|A|, |i?|), SSMTSP problems; we discuss the reduction for the 
assignment problem. 

A popular algorithm for the assignment problem follows the primal dual 
paradigm [1, Sec. 12.4], [4, Sec. 7.8], [3]. The algorithm constructs a perfect 
matching and a dual solution simultaneously. A dual solution is simply a function 
TT : V ^ M that assigns a real potential to every node. We use V to denote 
Ad B. The algorithm maintains a matching M and a potential function tt with 
the property that 

(a) w{e) < 7r(a) + 7r(b) for every edge e = (a,b), 

(b) w(e) = 7r(a) + 7r(b) for every edge e = (a,b) G M and 

(c) 7r(b) = 0 for every free^ node b € B. 

Initially, M = 0, 7r(a) = maxgg e w(e) for every a G A and 7r(b) = 0 for every 
b G B. The algorithm stops when M is a perfect matching^ or when it discovers 
that there is no perfect matching. The algorithm works in phases. In each phase 
the size of the matching is increased by one (or it is determined that there is no 
perfect matching). 

A phase consists of the search for an augmenting path of minimum reduced 
cost. An augmenting path is a path starting at a free node in A, ending at a free 
node in B and using alternately edges not in M and in M. The reduced cost of 
an edge e = (a, 5) is defined as w{e) = 7r(a) + n{b) — w{e)-, observe that edges 
in M have reduced cost zero and that all edges have non-negative reduced cost. 
The reduced cost of a path is simply the sum of the reduced costs of the edges 

^ A node is free if no edge in M is incident to it. 

® It is easy to see that M has maximal weight among all perfect matchings. Observe 
that if M' is any perfect matching and rr is any potential function such that (a) holds 
then w{M') < '^{v). If (b) also holds, we have a pair with equality 

and hence the matching has maximal weight (and the node potential has minimal 
weight among all potentials satisfying (a)). 
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contained in it. There is no need to search for augmenting paths from all free 
nodes in A; it suffices to search for augmenting paths from a single arbitrarily 
chosen free node Uq G A. 

If no augmenting path starting in oq exists, there is no perfect matching in 
G and the algorithm stops. Otherwise, for every v G V, let d{v) be the minimal 
reduced cost of an alternating path from oq to v. Let bo G B he a, free node in 
B which minimizes d{b) among all free nodes b in B. We update the potential 
function according to the rules (we use tt' to denote the new potential function) : 

(d) 7r'(a) = 7r(a) — max(d(6o) — d{a),0) for all a G A, 

(e) 7 t'(6) = Tr{b) + max(d(6o) — d(b),0) for all b G B. 

It is easy to see that this change maintains (a), (b), and (c) and that all edges on 
the least cost alternating path p from oq to bo become tight^. We complete the 
phase by switching the edges on p: matching edges on p become non-matching 
and non-matching edges become matching edges. This increases the size of the 
matching by one.^ 

A phase is tantamount to a SSMTSP problem: oq is the source and the 
free nodes are the targets. We want to determine a target (= free node) bo with 
minimal distance from oq and the distance values of all nodes v with d(v) < d{bo)- 
For nodes v with d{v) > d{bo), there is no need to know the exact distance. It 
suffices to know that the distance is at least d{bo). 

Table 3 shows the effect of the pruning heuristic for the bipartite matching 
algorithm. (The improved code will be part of LEDA Version 4.3.) 

A Proof of Lemma 1 

Proof (Lemma 1). Since each node is free with probability q = 
f/n and since the property of being free is independent from 

the order in which nodes are removed from the queue, we have 

Pr (T = t\R=r) = (1 — qY~^q and Pr{T>t\R=r) = (1 — qY~^, 
for 1 < t < r. If t = r, Pr (T = r | i? = r) = (1 — qY~^ = 
Pr (T > r I i? = r). 

The expected number of iterations is 

E[T\R = r]= Pr {T>t\R = r) = Ei<t<rY ~ qY~^ + (1 ~ Qr~^ 



An edge is called tight if its reduced cost is zero. 

® The correctness of the algorithm can be seen as follows. The algorithm maintains 
properties (a), (b), and (c) and hence the current matching M is optimal in the 
following sense. Let Am be the nodes in A that are matched. Then M is a maximal 
weight matching among the matchings that match the nodes in Am and leave the 
nodes in A \ Am unmatched. Indeed if M' is any such matching then w{M') < 
'YYaeA '^(®) + ~ w{M), where the inequality follows from (a) and (c) 

and the equality follows from (b) and (c). 
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Table 3. Effect of the pruning heuristic. LEDA stands for LEDA’s bipartite matching 
algorithm (up to version LEDA-4.2) as described in [4, Sec. 7.8] and MS stands for a 
modified implementation with the pruning heuristic. We created random graphs with 
n nodes on each side of the bipartition and cn edges inbetween. The running time is 
stated in CPU-seconds and is an average of 10 trials. 



Unit Weights 



n 




C 


LEDA 


MS 




c 


LEDA 


MS 




c 


LEDA 


MS 


10000 




2 


24.14 


8.84 




3 


31.56 


6.09 




4 


34.64 


4.17 


20000 




2 


83.95 


30.77 




3 


113.14 


21.73 




4 


125.60 


13.94 


40000 




2 


300.38 


107.40 




3 


426.43 


75.12 




4 


477.63 


44.91 



Random Weights [0 ... 1000] 



n 




C 


LEDA 


MS 




c 


LEDA 


MS 




c 


LEDA 


MS 


10000 




2 


1.20 


0.88 




3 


4.94 


2.47 




4 


15.07 


6.41 


20000 




2 


2.61 


1.86 




3 


10.35 


5.09 




4 


35.76 


14.34 


40000 




2 


6.08 


4.17 




3 


23.85 


11.51 




4 


84.57 


33.98 



Random Weights [1000 ... 1005 



n 




C 


LEDA 


MS 




c 


LEDA 


MS 




c 


LEDA 


MS 


10000 




2 


9.80 


6.55 




3 


14.77 


9.12 




4 


17.57 


9.44 


20000 




2 


26.27 


19.15 




3 


46.13 


27.02 




4 


62.45 


30.24 


40000 




2 


86.32 


59.10 




3 


155.98 


86.65 




4 


166.04 


92.63 



B Proof of Lemma 2 



Proof (Lemma 2). In the standard scheme every node that is reached by the 
search is inserted into the queue. If we remove a total of t elements from the 
queue, the edges out of f — 1 elements are scanned. A node u, u yf s, is not 
reached if none of these t — 1 nodes has an edge into v. The probability for this 
to happen is (1 — pY~^ and hence the expected number E [/b” | T = t] of nodes 
reached is n — (n — 1)(1 — pY~^. This is also the number of insertions into the 
queue under the standard scheme. 

If R and / are large, we have 



E [IS \ R and / are large] 

= [^'S' \ T = t and R and / are large] Pr (T = t \ R and / are large) 

= Et>i (n - (n - 1)(1 - (1 - qY~^q + (n - (n - 1)(1 - p)^~Y (1 - q)^~^ 

- JYt>R [n-{n- 1)(1 - pY~Y (1 - lY-\ 

= Et>i [n-{n- 1)(1 - pY~Y (1 - + o(l) 

= n-q{n-l) Et>o(l “ ?)*(! “ pY + o(l) = n-q(n- 1) + o(l) 

= n - 1 - (n - Y p+Y-pg + 1 + o(l) = (^ - 1) 

_ e(l-q') (l-glc/n , N 

a-\-(l — a^c/n ' v / 



g+(l— g)c/n q-\-{l — q)c/n 



+i+o(i) 

P+q—pq ' \ J 

-c+l + o(l) . 



□ 
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The final approximation is valid if c/n <C q. The approximation makes sense 
intuitively. We relax the edges out of 1/g — 1 nodes and hence relax about c times 
as many edges. There is hardly any sharing of targets between these edges, if n 
is large. We conclude that the number of insertions into the queue is ^ — c + 1. 



C Estimation of E [INRR \ Ei] 



We have: 



E [INRR I El] < Eo<fc<i 

= Ei<i<i ^ “ (1 “ 9) ) ’ 



iq 



E 



l<fc<i 



Qq\l-qy-'^ 



where the first equality follows from The final formula can 

also be interpreted intuitively. There are about iq free nodes preceding Vi and 
hence Vi is inserted with probability about 1 / (iq) . 

In order to estimate the final sum we split the sum at a yet to be determined 
index iq. For i < iq, we estimate (1 — (1 — qY) < iq, and for i > iq, we use 
(1 — (1 — qY) < 1. We obtain 






*0 



ilnd- 
<? *0 



For iq = 1/q (which minimizes the final expression®) we have 



E [INRR I Ez] < i • (1 + ln{lq)) . 
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Abstract. Real algebraic expressions are expressions whose leaves are integers 
and whose internal nodes are additions, subtractions, multiplications, divisions, k- 
th root operations for integral k, and taking roots of polynomials whose coefficients 
are given by the values of subexpressions. We consider the sign computation of real 
algebraic expressions, a task vital for the implementation of geometric algorithms. 
We prove a new separation bound for real algebraic expressions and compare it 
analytically and experimentally with previous hounds. The hound is used in the 
sign test of the number type leda_real. 



1 Introduction 

Real algebraic expressions are expressions whose leaves are integers and whose internal 
nodes are additions, subtractions, multiplications, divisions, A:-th root operations for 
integral k, and taking roots of polynomials whose coefficients are given by the values 
of subexpressions; the exact definition is given below. Examples are vTz + — 

\J vTz + + 2-\/357 and ■ We consider the sign computation of 

real algebraic expressions. 

Our main motivation is the implementation of geometric algorithms. The evaluation 
of geometric predicates, such as the incircle or the side-of predicate, amounts to the 
computation of the sign of an expression. Non-linear objects (circles, ellipses, . . . ) lead 
to expressions involving roots and hence an efficient method for computing signs of 
algebraic expressions is an essential basis for the robust implementation of geometric 
algorithms dealing with non-linear objects. 

The separation bound approach is the most successful approach to sign computation; 
it is, for example, used in the number type leda_real mini and the number type 
Expr of the CORE package |(7|. A separation bound is an easily computable function sep 
mapping expressions into positive real numbers such that the value ^ of any non-zero 
expression i? is lower bounded by sep(i?), i.e., either^ = 0or|^| > sep(Fl). Separation 
bounds allow one to determine the sign of an expression by numerical computation. An 
error bound A is initialized to some positive value, say A — I, and an approximation 
^ of ^ with 1^ — ^1 < Zi is computed using approximate arithmetic, say floating point 

* Partially supported by esprit ltr project (Effective Computational Geometry for Curves and 
Surfaces). 
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arithmetic with arbitrary-length mantissa. If |^| > A, the sign of ^ is equal to the sign 
of Otherwise, |^| < A and hence |^| < 2Zi. If 2Z\ < sep{E), we have ^ = 0. If 
2Z\ > sep{E), we halve A and repeat. The worst case complexity of the procedure just 
outlined is determined by the separation bound; log {l/sep{E)) determines the maximal 
precision needed for the computation of ^ and we refer to log {l/sep{E)) as the bit bound. 
If ^ 7 ^ 0, the actual precisions required is log ^ and hence “easy sign tests” are much 
faster than the worst case. This feature distinguishes the separation bound approach to 
sign computation from approaches that explicitely compute a dehning polynomial. 

Separation bounds have been studied extensively in computer algebra JsiioiTsrm 
1171 . as well as in computational geometry I5izilbrjll5ll . We prove a new separation 
bound for the following class of real algebraic expressions. The value of a real algebraic 
expression is either a real algebraic number or undefined (at the end of Section E] we 
show how to test whether the value of an expression is dehned). 

(1) Any integer ?; is a real algebraic expression. The integer is also its value. 

(2) If El and E 2 are real algebraic expressions, so are E\ + E 2 , Ei — E 2 , Ei ■ E 2 , 
EijE 2 , and -l/EJi, where fc > 2 is an integer. The value of \/EJi is undehned if k is 
even and the value of Ei is negative. The value of E\ / E 2 is undefined, if the value 
of E 2 is zero. The value of £^iop E 2 or is undefined, if the value of Ei or the 
value of E 2 is undefined. Otherwise the value of Ei + E 2 , Ei — E 2 , Ei ■ E 2 , and 
El / E 2 is the sum, the difference, the product and the quotient of the values of Ei 
and E 2 respectively and the value of \/Ei is the k-th root of the value of Ei . 

(3) If Ed, Ed-i, . . . , El, Eq are real algebraic expressions and j is a positive integer 

with 0 < j < d, then o(j, Ed, Ed-i, . . . , Ei,Eq) is an expression. If the values of 
the Ei are defined and is the value of Ei, the value of the expression is the j-th 
smallest real root of the polynomial ^d^‘^ + + • ■ ■ + if polynomial 

has at least j real roots. Otherwise, the value is undefined. 

Below, expression always means real algebraic expression. An expression is given as a 
directed acyclic graph (dag) whose source nodes are labeled by the operands and whose 
internal nodes are labeled by operators. We call an expression simple if only items (1) 
and (2) are used in its definition and we call it simple and division-free if, in addition, 
no division operator occurs in the expression. 

The starting point for the present work is the bound given by Burnikel et al. Q for 
simple expressions. We refer to this bound as the BFMS bound in the sequel. 

Lemma 1 (|[^). Let E be an expression with integral operands and operations +, — , 
■> /’ ^ integral k > 2. Let ^ be the value of E, let the weight D{E) of E be the 
product of the indices {the index of a ^ operation is k) of the radical operations in E, 
and let u{E) and 1{E) be defined inductively on the structure of E by the rules shown 
in the table below. 





u{E) 


1{E) 


integer N 
El ± E 2 
El ■ E 2 
El/ E 2 


\N\ 

u{Ei) ■ 1{E2) + l{Ei) ■ u{E 2) 
u{Ei) ■ u{E 2 ) 
u{Ei) ■ 1{E2) 


1 

l{Ei)-l{E2) 
l{Ei)-l{E2) 
l{Ei) ■ u{E2) 

0W) 
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Then^ = 0 or ^l{E)u{E)^^^^ < \^\ < u{E)l{E)^^^'^ ^ . If E is division-free, 

1{E) = 1, and the above bound holds with D{E)^ replaced by D{E). 

Observe the difference between the division-free case and the general case. For 
simple division-free expressions, the BFMS-bound is the best bound known. Expres- 
sions with divisions arise naturally in geometric applications. Inputs to expressions are 
frequently fractions and, e.g., normalizing a line equation amounts to a division. For 
expressions with divisions, the BFMS-bound is much weaker than for expressions with- 

out divisions. We give an example. Consider the expression — ^ j . Here 

u{E) Ki 2io, 1{E) = 1 and D{E) = 8. So the BFMS bound is since 

E is not division-free and hence the dependence (of the logarithm of the bound) on D is 
quadratic. Without the final redundant division, the expression is division-free and the 
bound becomes Our new bound handles divisions much better and also 

applies to a wider class of expressions than the BFMS bound. 

This paper is structured as follows. In Section 0 we review the proof of the BFMS 
bound and motivate our new way of dealing with divisions. In Section 0 we prove 
our main theorem, a separation bound for expressions defined by (1), (2), and (3). In 
Sections 0 and 0 we compare our bound analytically and experimentally to previous 
bounds. 

2 A Review of the BFMS Bound 

An algebraic integer is the root of a polynomial with integer coefficients and leading 
coefficient one. The following three Lemmas were already used in Q and 

Lemma 2. Let a be an algebraic integer and let deg(a) be the algebraic degree of a. If 
U is an upper bound on the absolute value of all conjugates of a, then |a| > 

Proof. The proof is simple. Let d be the degree of a and let a\ = a, 02 , • ■ • , be the 
conjugates of a. The product of the conjugates is equal to the constant coefficient of the 
defining polynomial and hence in Z. Thus |a| • U‘^~^ >1. □ 

Lemma 3 (H6 8j or |0 Theorem 4]). Let a and (3 be algebraic integers. Then a E P, 
aP and y/a are algebraic integers. 

We also need to cover item (3) in the definition of algebraic expressions. 

Lemma 4. Let g be the root of anionic polynomial P{X) = X'^+an-iX'^~^ + - ■ -EaQ 
of degree n where the coefficients a„_i, ... , ao are algebraic integers. Then g is an 
algebraic integer. 

Proof. This fact is well-known, a proof can, for example, be found in 11 4L Theorem 2.4]. 
We include a proof for completeness. The proof uses an argument similar to the proof 
of LemmaQ Let 1 < ij < deg(aj), be the conjugates of Uj for 0 < j < n — 1 
and let oij be the vector formed by the conjugates of aj . Consider the polynomial 

Q{x) = nn • • ■ n + oct- 2 ^x^-^ + . . . + 4*“^) . 

iQ 2i in — 1 
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pis a root of Q(X) andQ{X) is symmetric intheaj*^^ for all j. The theorem on elemen- 
tary symmetric function implies that Q{X) is a polynomial in X and the elementary 
symmetric functions ai{aj), . . . , CT(jeg(aj)(Sj). The elementary symmetric function 
<Ji{a.j) is the coefficient of in the minimal polynomial of Uj and hence in Z 

(since aj is an algebraic integer). Thus Q{X) is a monic polynomial in Z[X] and g is 
an algebraic integer. □ 

Lemma 5 ( 0 , Lemma 6]). Let a and P be algebraic integers and let Ua and Ug be 
upper bounds on the absolute size of the conjugates of a and P, respectively. Then 
Ua + U/} is an upper bound on the absolute size of the conjugates of a ± / 3 , UaU/s is an 
upper bound on the absolute size of the conjugates of aP, and f/Ua is an upper bound 
on the absolute size of the conjugates of 

We also need bounds for the absolute size of roots of monic polynomials. Let P(AT) = 
X" + an-iX'^~^ + + • • • + flo be a monic polynomial with arbitrary real 

coefficients, not necessarily integral, and let a be a root of P{X). A root bound is 
any function of the coefficients of P that bounds the absolute value of a, i.e., |o;| < 
<P{an-i,an-2, ■ ■ ■ , ao). We require that <P is monotone, i.e., if |a.| < bi for 0 < i < 
n- l,then^(a„_i,a„_2,... ,ao) < 6„_2, ■ ■ ■ ,&o)- 

Examples of root bounds are: 

|q:| < 2max V|a„_2|, ^|a„_3|, . . . , l/|aol) 

|a| < 1 + max(|a„_i|, |a„_2|, • • . , |oo|) 

|q:| < max y/n\an-2\, ^n|a„_3|, . . . , ^n\ao\^ 

A proof of all bounds can be found in I I 711 . The first bound is called the Lagrange- 
Zassenhaus bound and the last two bounds are called the Cauchy bounds. 

We next briefly review the proof of the BFMS bound. For a division-free simple 
expression E one observes that the value ^ of P is an algebraic integer (by LemmaOI) 
and that u{E) is an upper bound on ^ and all its conjugates (by Lemma| 3 . Furthermore 
D{E) is an upper bound for the algebraic degree of E. Thus |^| < u{E) and |^| > 
by LemmaQ 

Expressions with divisions are handled by reduction to the division-free case. Let 
P be a simple expression and let ^ be its value. We construct a new expression dag, 
also with value ^ = val{E), containing only a single division. Moreover, the division 
is the final operation in the dag and hence val{E) = vapEf} / val{E2), where Ex and 
E2 are the inputs to the division. The bounds for the division free case apply to Ei 
and E2 and D{Ei) and D{E2) are at most D{E)^. The construction of the new dag is 
straightforward. For every node A in the original dag there are two nodes Ai and A2 
in the new dag such that val{A) = val{Ai) / val{A2) . For the leaves (which stand for 
integers) the replacement is trivial (we take Ai = A and A2 = 1 ) and for interior nodes 
we use the rules 

Ai A1B2 E A 2 B\ Ai Bi A1A2 A\ ,B\ A1B2 kl Ai ^ A\ 

A2 B2 A2B2 A2 B2 B1B2 A2 B2 A2B1 V A2 

In this way, each root operation in the original dag gives rise to two root operations in 
the new dag. This may square the P-value of the expression. 
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The starting point for the present paper was a simple but powerful observation. Al- 
though the transformation rules above are natural, they are not the only way of obtaining 
division free expressions Ei and E 2 with val{E) = val{Ei) / val{E 2 ) ■ Instead of the 



crease the total degree of the expression and hence D{Ei) and D{E 2 ) are at most D{E). 
In an earlier version of the paper, we only used the first alternative of the new rule. Chee 
Yap (personal communication, January 2001) pointed out to us that it is advantageous 
to have both rules (see the proof of LemmaEI). 

3 The New Bound 

We derive a separation bound for the expressions defined by items (1) to (3). For items 
(1) and (2), we use the BFMS rules with the modification proposed in the previous 
paragraph. 

The diamond operation allows one to take the root of a polynomial P{X) = adX‘^ + 
ad-iX‘^~^ + • — h aiX + ao where the are arbitrary real algebraic numbers. Every 
real algebraic number can be written as the quotient of two algebraic integers; this is 
well-known, but will be reproved below as part of the proof of our main theorem. Let 
where Vi and Si are algebraic integers. Then P{X) = ^X'^ + + 

■ ■ ■ + ^X + Let D = Y\^i- By multiplication with D we obtain D ■ P{X) = 
{vdD/Sd)X‘^ + {lyd-iD /Sd-i)X‘^~^ -f • • • -f {vqD/So)^ ^ polynomial with algebraic 
integral coefficients. 

We next derive a monic polynomial. To get rid of the leading coefficient {vdD/Sd), 
we multiply by {i/dD / SdY~^ and substitute Xj(ydD jS^ for X. We obtain Q{X) = 



D ■ {ydD/SdY ^ ■ P with 

Q{X) =X'^ + {vdD/SdYvd-iD/Sd-YX^^-^ + • • • + {vdD/Sdt (i^oD/So) 



which is monic and has algebraic integer coefficients. The root bounds of Section 0 
provide us with an upper bound on the size of the roots of Q{X): the size of any root 

of Q{X) is bounded by u = ^{{vdD/Sd){vd-iD/Sd-i), ■■■ , {vdD/SdY {vqD/Sq)). 
Since the roots of P are simply the roots of Q divided by VdD jSd, this suggests to extend 
the definitions of u and I as follows: For an expression E denoting a root of a polynomial 
of degree d with coefficients given by Ed, Ed-i, Ed- 2 , ■ ■ ■ ,Eq we define 



We still need to define the weight D{E) of an expression. We do so in the obvious way. 
The weight D{E) of an expression dag E is the product of the weights of the nodes and 
leaves of the dag. Leaves and +, — , • and /-operations have weight 1, a ^ -node has 
weight k, and a o{j, Ed, ■ ■ ■ )-operation has weight d. 

We can now state our main theorem. 



last rule we may also use 




. The new rule does not in- 




d—i 
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Theorem 1. Let E be an expression with integer operands and operations ^ 

for integral k and o(j, . . . ) operations and let D{E) be the weight of E. Let ^ be the 
value of E. Let u{E) and 1{E) be defined inductively on the structure of E according 
to the following rules: 





u{E) 


1{E) 


integer N 
El ± E 2 
El • E 2 
E 1 /E 2 

f/ El and ui^Ei) l[Ei) 

f/Eji and ui^Ei) < l[Ei) 
o{j, Ed, ■■■ , Eo) 


\N\ 

u{Ei)-l{E 2 ) + l{Ei)-u{E 2 ) 
u{Ei) ■ u(E 2 ) 
u{Ei) ■ 1{E2) 
^uiEi)l{Eip-^ 
u(Ei) 

,{l{Ep-^u{EfU,^P{Et)),...) 


1 

l{Ei) ■ 1(E2) 
l{Ei) ■ 1{E2) 
l{Ei)- u{E 2) 
l{Ei) 


^{u{Eifi-H{Ei) 
'^{Ed) Ylkjid PEk) 



Then either ^ = 0 or {l{E)u{E)^<^^^~^) ^ < |^| < u{E)l{E)°<^^'>-\ 

Proof We show that the rules for u and I keep the invariant that there are algebraic 
integers [3 and 7 such that ^ = ( 3 /'y and u{E) is an upper bound on the absolute size of 
the conjugates of (3 and 1 {E) is an upper bound on the absolute size of the conjugates 
of 7. 

We prove this by induction on the structure of E. The base case is trivial. If E is an 
integer N, we take j 3 = N and a = 1 ; /? is the root of the polynomial X — N and a is 
a root of X — 1 . 

Now let E = El ± E2- By induction hypothesis we have = fijl'yj for j = 1 , 2 . 
We set ft = / 3 i 72 ± /327i and 7 = 7172- Since algebraic integers are closed under 
additions, subtractions and multiplications, (3 and 7 are algebraic integers. By Lemma |3 
u{E) = u{Ei) ■ 1{E2) + 1 {E\) ■ u{E2) is an upper bound on the absolute size of the 
conjugates of [ 3 . Similarly, 1 {E) is an upper bound on the absolute size of the conjugates 
of 7. 

If i? = i?i • £^2 , we set ft = Pifl2 and 7 = 7172. The claim follows analogously to 
the previous case by LemmaEl 

If E = E1/E2, we set ft = f3i'j2 and 7 = f32l2- Again, the claim follows by 
LemmaEl 

If E = f/Ef and fti > 71, we set ft = ^ ( 3 ili~^ and 7 = 71. Since algebraic 
integers are closed under ^ -operations, f 3 is an algebraic integer. By LemmaEl u{E) 
is an upper bound on the absolute size of the conjugates of ( 3 . There is nothing to show 
for 7 = 7i . 

If E = \fE\ and / 3 i < 71, we set ft = fti and 7 = ^ Since algebraic 
integers are closed under ^ -operations, 7 is an algebraic integer. By LemmaEl 
an upper bound on the absolute size of the conjugates of 7. There is nothing to show for 

P = Pi- 

Finally, let E be defined by a o(j, E^, . . . , i?o)‘OP£r^tion- We set 

p = o(j, 1, /9d-i7d7d-2 • • • 70, 7/^d-27d7d-i7d-3 ' ' ' 7o, ■ • ■ , 7"~Sd7d-i7d-2 ' ' ' 7iPo) 

and 7 = Pdld-ild-2 • ■ ’7o- By the discussion preceding the statement of our main 
theorem, f = P/7, P and 7 are algebraic integers, 1 {E) is an upper bound on the 
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absolute size of the conjugates of 7 , and u{E) is an upper bound on the absolute value 
of the conjugates of (3. This completes the induction step. 

Rewriting ^ as corresponds to a restructuring of the expression dag defining E 
into an expression dag E' with a single division-operation. We have D{E') = D{E). 

We still need to argue that D{E) is an upper bound on the algebraic degree of [3. 
This follows from the fact that every operation leads to a field extensions whose degree 
is hounded by the weight of the operation. 

We now have collected all ingredients to bound the absolute value of from below. 
If ^ 7 ^ 0, we have (3^0. The absolute value of /3 and all its conjugates is bounded by 
u{E). Thus \j3\ > by LemmaEI Also I 7 I < 1{E). Thus 

/3 ^ 1 1 ^ 1 

“ 7 - u(£;)'ies(/3)-l ■ - u{E)D{E)--^ . 1{E) ' 



□ 

The value of an algebraic expression may be undehned. Divisions by zero and taking 
a root of even degree of a negative number are easily caught by the sign test. We next 
argue that the sign test also allows us to test whether the diamond-operation is well 
defined. For this matter, we need to determine the number of zeros of a polynomial. 
Sturm sequences, see III 1 1 chapter 5] or ill 71 Chapter 7] are the appropriate tool. The 
computation of Sturm sequences amounts to a gcd computation between a polynomial 
and its derivative. Our sign test is sufficient to implement a gcd computation. 

4 Comparison to Other Constructive Root Bounds 

We compare our new bound to previous root bounds provided by Mignotte GU, Canny 
Q, Dube/Yap fTRlI . BFMS iKIl .ill . Scheinermann H^ll . Li/Yap 0. We refer to the bound 
presented in this paper as BFMSS. Root hounds can be compared along two axees: 
according to the class of expressions to which they apply and according to their value. 

The bounds by Mignotte, DubeAfap and Scheinerman apply to division-free simple 
expressions, BFMS applies to simple expressions. The hounds in |9| and Ill'll apply to 
expressions dehned by items (1) to (3) with the restriction that the Ed to Eq in (3) must 
he integers. Canny’s bound is most general. It applies to algebraic numbers dehned by 
systems of multi-variate polynomial equations with integer coefficients. 

We next discuss the quality of the hounds. In ll^l9ll it was already shown that the 
BFMS-bound it never worse than the hounds by Mignotte, Canny, Dube/Yap, and 
Scheinermann. In [3] it was also shown that the BFMS bound and the Li/Yap bound 
are incomparable. 

Lemma 6 (C. Yap, personal communication). Let E be an arbitrary simple expres- 
sion, let u and I be defined as in the original BFMS-bound, let u' and I' be defined as in 
Theorem\J\ and let D = D{E) be the degree bound of E. Then 

l{E)u{E)’^^-^ > I' {E)u' {E)^~^ , 

i.e., the improved bound is always as least as strong as the orginal BFMS-bound 
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Proof. We show ^ and u'{E) < and l'{E) < 1{E)°^^'> by 

induction on the structure of E. Assume that these relations hold. Then 



l{E)u{E)^^-^ 



1{E) 

u{E) 



{u{E)^r > 



l'{E) 

u'{E) 



u'{E)’= 



l'{E)u\E)^-^ 



and we are done. 

The proof of the equality is a simple induction on the structure of E. The base case is 
clear. In the inductive step we write ui instead ofu(£^i) and similarly for £12 u' and Z'.If 

— - ^ + ^ ^ + ^ = ^^.Multiplication 



E = £li+£l 2 , wehave 



1(E) 



and division are handled similarly. If £1 = we have (assuming u'l > l[, the case 

I'(E) ■ 



u'l < I'l is handled similarly) 



«(E) _ 

1{E) - 



lytzT _ 
Vh 



lux — 
h 



il ^ 

For the inequalities we have to work slightly harder. The base case is again clear; 
observe that D = 1 in the base case. It is also clear that u{E) > 1 (or u{E) = 0) and 
1{E) > 1 for all £1. If £1 = £li ± £^2, we have (using D > Di and D > Df) 



u{E)^ = {uil2 + U2li)’^ > + {u2h)^ > > u'il'2 + U2I1 = u'(E) 



and 1{E)^ = = ^\E). Multiplication and division are 

handled similarly. If £1 = we have D{Ei) = D{E)/k and hence (assuming 

u'l > I'l, the case u'l < I'l is handled similarly) u{E)^ = = Ui^ > u'l > 

l^u'i{l'iY-^ = u'{E) and 1{E)^ = Zf ^ > I'l = l'{E). □ 

We next show that the new bound can be significantly better than the old bound. 
Consider the expression F = X/xja and E = F — F where a; is a cfc-bit integer for 
some constant c and a is a d-bit integer for some constant d. Then D{E) = k. We 
evaluate both bounds as functions of k. 

For the BFMS-bound we have logrt(E) = {l/k)ck = c, log^(F) = d/k, 
logrt(E) = 1 + c + d/k, logl{E) = 2d/k and hence the BFMS bit bound is 

(P - l)logM(£;) +logZ(E) = 0(A;2). 

FortheBFMSS-boundwehavelogu(E) = (l/fc)(cfc + d) = c+d/fc, log Z(E) = d, 
logrt(E) = 1 + c + d/k + d, logl{E) = 2d and hence the BFMSS bit bound is 
(fc — 1) logu(E) + log ((E) = 0(fc). 

It remains to compare the BFMSS and the Li/Yap bound. For division-free simple 
expressions, the bounds are identical. For expressions with divisions, the bounds are 
incomparable. 

We start with an example, where the BFMSS-bound is significantly better. LelQ 
Eq = 17 /3, let Fi — y/Ej/Zi and Ei = F^ + £) for 1 < i < fc, and let E = E^ — E^. 
Then deg(Ej) = deg(Ei) = 2*. We evaluate both bounds as functions of k. 

For the BFMSS bound, we have log u{Eq) = log 17, log ((Eq) = log 3, log Z(Ei) = 
log((Ei_i), logM(Ei) = l(logu(Ej_i) + log((E^_i)), logZ(Ei) = 21og((Ei) = 
2*log3, logM(Ei) = l + logu(Ei) +log((Ei) = 1+ flogM(Ei_i) + §2*“Mog3 < 
2 + 2“* log 17 + 2* log 3 and hence log u{E) = 1 + log u{Ek) + log ((Efc) < 3 + 4-2^ 
and 

log ((E) = 21og((Efe) <4-2^. We conclude that the BFMSS bit bound is equal to 

* Any other fraction will also work as the initial value. 
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(2* — 1) (3 + 4 • 2^) + 4 • 2^ = 6>(4^). Increasing k by one, quadruples the numbers of 
bits. 

The Li/Yap bound involves the lead coefficient of the minimal polynomial and is at 
least the logarithm of the lead coefficient. Li and Yap compute the following estimates Ic 
for the lead coefficients. Let di = D{Ei) = D{Fi) = 2®. Then log Ic{Eq) = log 3 > 1, 
loglc(Fi) = loglc{Ei_i), logl(E^) = 2 ■ ■ \oglc{Ei) = 2 • 2® log = 

2® ni<j<i • log 3 > 2®!®+^!/^, log lc{E) = 2-2^ log lc{Ek) and hence the Li- Yap 
bit bound is 42(2* /^). Increasing n by one multiplies the required number of bits by 
more than 2*. 

We next give an example where the Li/Yap bound is better. We start with the fraction 
17/3, square k times and then take roots k times. The weight of the expression is 2* and 
log u{E) > 2*. The BFMSS bit bound is therefore at least 42(4*). On the other hand, 
the Li/Yap bound is 0(2*). 

An implementation should compute the Li/Yap and BFMSS bounds and use the 
better of the bounds. 

5 Experimental Evaluation 

The separation bound approach to sign determination of algebraic numbers is used in 
the number types real of LEDA 1 1 ill and Expr of CORE |2(]. We report about the 
improvements in running time due to the new separation bounds and due to a recent 
reimplementation of leda_real. We also compare CORE and leda_real. 

All tests are based on LEDA 4.2.1 with the most recent arithmetic module incorpo- 
rated. For the tests with the CORE library we used CORE vl.3 available from 0; it uses 
the Li/Yap-bound. All benchmarks are performed on a Sun Ultra 5 with 333 MHz, 128 
MB RAM, running Solaris 2.7. We used g++ 2.95.2.1 as a compiler, times are always 
stated in seconds. 

We briefly review the implementation of leda_real, a detailed description is avail- 
able in H . The number type supports the sign determination of simple algebraic expres- 
sions. Expressions are represented by their expression dag Q{E). The input values of E 
are contained in the leaves of the dag, every inner node corresponds to an arithmetical 
operation, and the root corresponds to E. 

When the sign of an algebraic number E needs to be dermined, the datatype first 
computes a separation bound qe- Using leda_bigf loat arithmetic (= floating-point 
numbers with exponent and mantissa of arbitrary length), the datatype computes succes- 
sively intervals of decreasing length that include E, until the interval does not contain 
zero or the length of the interval is less than qE- 

Several shortcuts are used to speed up the computation of the sign. Eirst, a double 
approximation E and an error bound err such that \E — E\ < err is stored with every 
node of the expression dag. As long as the double approximation E is known to be exact, 
i.e. err = 0, no expression graph is constructed and E represents E. 

Secondly, if the double approximation E suffices to determine the sign of E, i.e. 
Q ^ \E — err, E + err], no bigfloat computation is triggered. This technique is called 
a floating-point fiiter. 
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In the reimplementation, we made the following improvements: 

(1) the separation bound is the better of the Li/Yap and the BFMSS bound. 

(2) the implementation of the underlying bigfloat arithmetic has been improved; at 
the beginning it was based on number type leda_integer for integer numbers of 
arbitrary size, now it directly operates on vectors of long integers. 

(3) memory management within the real datatype has been improved; in particular, 
space for the bigfloat approximations is now only allocated if bigfloat computation 
is necessary for a sign determination. 

(4) the built-in floating-point Alters have been improved, both with respect to running 
time as well as precision. 

Overall, the efficiency has improved for ’easy instances’ (i.e. instances that do not 
need the bigfloat computation) due to improved floating-point filter techniques as well 
as for ’difficult instances’ due to the improved separation bounds and bigfloat imple- 
mentation. 

We turn to our experiments. The source code of all experiments is available at 
http : //WWW .mpi-sb .mpg. de/~funke/SepBoundESA01 .html Many of the experi- 
ments make use of L-bit random integers. We generated them outside the leda_real 
number type and used them as inputs for our expressions. 

(1) The first test is a simple check of a binomial expression. Let x = f,y = ^ 
where a, b, c, and d are L-bit integers and let E = (y/x + — -\/x + y + 2y/xy. 

For the old BFMS-bound we get a s.&Pbfms ~ 160L + 381, for our improved bound 
^^Pimprov ~ 9®L + 60, whereas the LiYap-bound gives sep^j^^^ = 28L + 60. This is 
of course reflected in the running times in Table HI left half. 

Table 1. Running time for experiments (1) and (2) 





Experiment (1) 


Experiment (2) 


L 


25 


50 


100 


200 


400 


800 


1600 


500 


1000 


2000 


4000 


8000 


16000 


BFMS 


0.04 


0.10 


0.27 


0.77 


2.21 


6.55 


20.73 


0.01 


0.03 


0.08 


0.25 


0.72 


2.33 


Improv 


0.01 


0.04 


0.10 


0.27 


0.77 


2.26 


7.07 


0.01 


0.03 


0.08 


0.24 


0.73 


2.32 


LiYAP 


0.00 


0.01 


0.02 


0.04 


0.11 


0.29 


0.91 


0.36 


1.05 


3.17 


9.47 


28.5 


85.6 



(2) Let X and y be L-bit integers, C = {y/x — y/y) /{x — y)) and E = C — C . For 
both our old and improved bound we get ssp^pMS ~ ^^Pimprov = 6L -f 64, whereas 
the LiYap-bound gives sep^^y^^ = 65L + 91. See Tabled right half. 

We now turn to examples for which we have already proved differing asymptotic 
behaviour of the bounds in Sectiond 

(3) First consider F = y/xjy and E = F — F where a: is a lOOfc-bit integer and y a 
32-bit integer. The BFMS bound is 0{k^), whereas the new bound is 0{k). The Li/Yap 
bound is also 0{k) and even better than our new bound. 

The running time of multiplication, division, and the root operation for L-bit numbers 
in leda_bigf loat is Doubling k in case of the BFMS bound quadruples the 

separation bound and hence multiplies the running time by aboufl 9, whereas in case of 
the improved bound, the separation bound doubles and the running time roughly triples. 
See Tabled left half, for the results. 

^ As machines get slower as they use more memory, we see a factor of slightly more than 9. 
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Table 2. Running times and separation bounds for experiments (3) and (4) 







Experiment (3) 


Experiment (4) 


k 




2 


4 


8 


16 


32 


64 


2 


3 


4 


5 


6 


BFMS time 




0.01 


0.02 


0.20 


2.22 


24.36 


86.6 


0.01 


0.01 


0.02 


0.80 


8.86 


BFMS bound 




391 


1683 


6751 


26781 


106396 


421787 


237 


1213 


5885 


27645 


126973 


Improv. time 




0.01 


0.01 


0.01 


0.04 


0.11 


0.36 


0.01 


0.01 


0.01 


0.04 


0.32 


Improv. bound 




214 


538 


1198 


2524 


5179 


10459 


76 


284 


1084 


4220 


16636 


LiYap time 




0.01 


0.01 


0.01 


0.03 


0.04 


0.12 


0.01 


0.01 


1.98 


1781 


(too long) 


LiYap bound 




150 


346 


750 


1564 


3195 


6427 


140 


2076 


65596 


4194428 


536871164 



(4) For Eq = 17/3, Fi = Ei = Fi + Fi, E = Ek — Ek, our bounds for 

E are 6>(4*^) (but with different constant factors), whereas the Li- Yap bound is 0(2^ ). 
See Table El right half, for the results. 

In the following test we compare different implementations: real(l) denotes our 
old implementation and real (2) the new implementation. 

(5) As in our first example we take x = f , J/ = g, where a, b, c, d are L-bit integers, 

and E = {^/x + — -yjx + y + 2^yxy. As we can see in Table 01 the improved 

implementation of the LEDA real datatype already leads to a speedup of factor 4, even 
with the same separation bound. The new separation bound gives another speedup of 
factor 3. We did not expect the currently available CORE/Expr implementation that far 
behind, since it uses the Li- Yap bound which is superior to our bounds in this example. 
We neither understand why there is no difference in running time for L = 100 and 
L — 200, nor the change in running time when doubling the bitlength of the input 
values. 



Table 3. Experiment (5) Table 4. Experiment (6) 



L 


25 


50 


100 


200 


400 


800 


real (1) 




0.12 


0.30 


0.98 


2.63 


8.48 


23.97 


real (2) 




0.04 


0.10 


0.27 


0.77 


2.21 


6.55 


real (2) 


^^Pimprot; 


0.01 


0.04 


0.10 


0.27 


0.77 


2.26 


CORE/Expr 


^^VoidLiYav 


2.32 


15.7 


116.9 


116.84 


692 


3973 



L 


50 


100 


200 


double 


0.08 


0.08 


0.08 


real (1) 


1.64 


1.65 


194 


real (2) 


1.22 


1.23 


120 


CORE/Expr 


568 


555 


672 



(6) The final comparison concerns easy sign tests. The following expression arises 
during Eortune’s sweep-line algorithm for Voronoi diagrams: E = ° where 

a, a', b, b' , c, and c' are random 3L-, 6L-, and 2L-bit integers. The root bounds do not 
play a role here, only the efficiency of fhe implemenfafion, in parficular fhe floating-point 
filters comes into play. To get meaningful results we measured the time of 200000 sign 
computations, see Tabled 

Clearly, pure double arithmetic is the fastest, creating the expression dag does 
not come without cost. But as you can see, our new implementation gains about 25% 
compared to the old one. The huge increase in running time for L — 200 can be explained 
by the fact that in this case, the numbers get too large to be representable by a double 
(remember that we create integers of length 6L). Therefore the floating-point filters 
will always fail and bigf loat arithmetic has to be used. CORE does not have built-in 
floating-point filters so it is much slower then leda_real. 
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6 Conclusions 

We presented a new separation bound which applies to a wide class of algebraic expres- 
sions and is easily computable. For many expressions it gives much better bounds than 
previous bounds resulting in significant gains in running time. We see two main chal- 
lenges: (1) For algebraic numbers defined by systems of polynomials, Canny’s bound is 
the best bound known. Provide a better bound. (2) Our bound as well as the Li/Yap bound 
is very easy to compute. In the context of expensive sign computations it is worthwile 
to investigate more expensive methods for computing separation bounds. 
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Abstract. This paper investigates geometric problems in the context of property 
testing algorithms. Property testing is an emerging area in computer science in 
which one is aiming at verifying whether a given object has a predetermined 
property or is “far” from any object having the property. Although there has been 
some research previously done in testing geometric properties, prior works have 
been mostly dealing with the study of combinatorial notion of the distance defining 
whether an object is “far” or it is “close”; very little research has been done for 
geometric notion of distance measures, that is, distance measures that are based 
on the geometry underlying input objects. 

The main objective of this work is to develop sound models to study geometric 
problems in the context of property testing. Comparing to the previous work in 
property testing, there are two novel aspects developed in this paper: geometric 
measures of being close to an object having the predetermined property, and the 
use of geometric data structures as basic primitives to design the testers. We believe 
that the second aspect is of special importance in the context of property testing 
and that the use of specialized data structures as basic primitives in the testers can 
be applied to other important problems in this area. 

We shall discuss a number of models that in our opinion fit best geometric problems 
and apply them to study geometric properties for three very fundamental and 
representative problems in the area: testing convex position, testing map labeling, 
and testing clusterability. 



1 Introduction 

A classical problem in computer science is to verify if a given object possesses a certain 
property. For example, we want to determine if a boolean formula is satisfiable, or if a set 
of polygons in the Euclidean plane is intersection free. In its very standard formulation, 
the goal is to give an exact solution to the problem, that is, to provide an algorithm 
that always returns a correct answer. In many situation, however, this formulation is 
too restrictive, for example, because there is no fast (or just fast enough) algorithm that 
gives the exact solution. Very recently, many researchers started studying a relaxation 
of the “exact decision task” and consider various forms of approximation algorithms 

* Research supported in part by an SBR grant No. 421090, by DFG grant Me872/7-l, and by the 
1ST Programme of the EU under contract number 1ST- 1999- 141 86 (ALCOM-FT). 
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for decision problems. In property testing (see, e.g., 
considers the following class of problems: 



le S! H ID] IH IN IE! (H [ia 



Let £ be a class of objects, D be an unknown object from £, and Q be a fixed property 
of objects from £. The goal is to determine (possibly probabilistically) if D has property 
Q or if it is far from any object in £ which has property Q, where distance between two 
objects is measured with respect to some distribution T> on £. 



The motivation behind this notion of property testing, is that while relaxing the exact 
decision task, we expect the testing algorithm to be significantly more efficient than any 
exact decision algorithm, and in many cases, we achieve this goal by exploring only a 
small part of the input. And so, for example, in 0 it is shown that all first order graph 
properties of the type “3V” can be tested in time independent of the input size (see also, 
111 311411 SI for some other most striking results). 

In the standard context of property testing, the first general study of geometric prop- 
erties appeared in Q. In this paper the authors studied property testing for classical 
geometric problems like being in convex position, for disjointness of geometric objects, 
for Euclidean minimum spanning tree, etc. Roughly at the same time, in Q, property 
testing for some clustering problems has been investigated. In UOl, the problem of test- 
ing if a given list of points in represents a convex polygon is investigated. In all these 
papers, the common measure of being close to having the predetermined property was 
the Hamming distance. That is, for an object D from a class £, a property Q, and a 
real £, 0 < £ < 1, we say D is £-far from having property Q, if any object D from £ 
that has property Q has the Hamming distance at least £ • |T)| from D. The Hamming 
distance is a standard measure to analyze combinatorial problems, but in the opinion of 
the authors, other more geometric distance measures should also be considered in the 
context of Computational Geometry. The reason is that this measure does not explore 
geometry underlying investigated problems, but only their combinatorial structure (how 
many “atom” objects must be modified to transform the object into one possessing the 
requiring property). This issue has been partly explored in the context of the metrol- 
ogy of geometric tolerancing 1.5161811 71 191201 . In this area (motivated by manufacturing 
processes) one considers the problems of verifying if a geometric object is within some 
given tolerance from having certain property Q. In geometric tolerancing, the researchers 
have been studying among others, the “roundness property,” the “flatness property,” etc. 
115161811711 . We emphasize, however, that there is a major difference between the notion 
of geometric property testing and geometric tolerancing in that in the former one allows 
to reject (as well as accept) any object that does not satisfy the property £), while in 
geometric tolerancing one should accept such an object if it is within given tolerance. 



Our contribution. This paper is partly of a methodological character. The main objective 
of this paper is to develop proper models to study geometric problems in the context of 
property testing. We shall discuss a number of models that in our opinion best fit geo- 
metric problems and apply them to study geometric properties for the most fundamental 
problems in the area. Comparing to the previous work in property testing, in the current 
paper we develop two main novel ideas: 
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geometric measures of being close to an object having the predetermined property, 
and 

the use of geometric data structures to develop the testers. 

We discuss these two issues in details in Sections Eland EJwhile testing convex posi- 
tion. We demonstrate the need of geometric distance measures for geometric problems 
and propose three new models that in our opinion suit best to study geometric properties. 
We show also that the complexity measures used in standard property testing have to be 
modified in order to achieve something non-trivial using geometric distance measures. 
We propose a model of computation that uses queries for geometric primitives (in this 
case range queries) as its basis and discuss its use and practical justifications. Finally, 
we illustrate all these issues by designing property testing algorithms for convex posi- 
tion. Unlike in the model investigated in JZI, our testing algorithms run in time either 
completely independent of the input size or only with a polylogarithmic dependency, 
and we believe that they fit much better the geometry underlying the problem of testing 
convex position. 

In Section 0 we investigate the map labeling problem. We first show that in the 
classical property testing setting (that uses uniform sampling of the input points) this 
problem does not have fast testing algorithms. Next, we show that by using geometric 
queries as basic operations one can obtain very efficient testing algorithms. We present 
an e-tester for map labeling that requires only poly {I /e) range queries of the form: 
“What is the i-th point in the orthogonal range R ?” 

Then, in Section El we consider clustering problems in our context and provide 
efficient testers for clustering problems in most reasonable geometric models. The goal 
of a clustering problem is to partition a point set in into k different clusters such 
that the cost of each cluster is at most b. We consider three different variants of the 
clustering problem (see, e.g., El): radius clustering, discrete radius clustering, and 
diameter clustering. We say that a set of points is e-farfrom clusterable with k clusters 
of size b, if there is no clustering into k clusters of size (1 -f e) &. We show that it is 
possible to test clusterability using 0{k/e‘^) oracle range queries. 

Comparing our results to those in we use a more powerful oracle but we also 
have a more restrictive distance measure. Using our distance measure and the classical 
oracle from m, it is impossible to design a sublinear property tester for this problem. 

Further, we show how to use our tester to maintain (under insertion and deletion of 
points) an approximate fc-clustering in of size at most (1 -f e) times the optimum in 
time polylog{n) for any constants k, d, and e. Here, n denotes the current number of 
points. 



2 Testing Convex Position 

Let us first consider the classical problem of testing if a point set P in the plane is 
in convex position (that is, the interior of the convex hull conv{P) contains no point 
from P, or equivalently, all points in P are extreme). Our goal is to consider a practical 
situation in which we allow some relaxation of the exact decision test and we consider 
the following type of testers: 
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Fig. 1. Which of the two point sets is “more convex?” In Figure (a) it is enough to delete only 3 
points (those in the top right comer) to obtain a point set in convex position; in Figure (b) one has 
to remove much more points to do so. On the other hand, the points from Figure (a) are visually 
far from convex position, while points in Figure (b) look similar as they were in convex position; 
it is enough to perturbate them very little to obtain a point set in convex position. 



If P is in convex position, then the tester must accept the input. 

If P is “far” from convex position, then the tester “typically” rejects the input. 

If P is not in convex position, hut it is close to being so, then the answer may he 
arbitrary. 

In order to use this concept we must formalize some of the notions used above. First 
of all, we assume a tester is a possibly randomized algorithm and, following standard 
literature in this area, by “typically” we shall mean that the required answer is output 
with probability at least |, where the probability is over the random choices made by 
the tester (and thus, this lower bound of | is independent of the input). 

2.1 Distance Measures — Far or Close 

A more subtle issue is what do we mean by saying that P is “far” from convex position. 
We pick a parameter e, 0 < e < 1, which will measure the quality of how “close” is 
P to convex position In the standard terminology used in the property testing literature 
(see, e.g., \mB), one uses the following definition: 

Definition 2.1. (Hamming distance) A point set P in the Euclidean space is e-far 
from convex position (according to the Hamming distance), if for any subset S Q P, 
|5'| < £ • |P|, set P\S is not in convex position. 

We found, however, that this measure often does not correspond to notions of the 
distance used in geometry (see, e.g., FigureQ]). It tells only about combinatorial properties 
of the object at hand, but it tells very little about geometry behind the object. For example, 
do we want to accept an n-point set P if it contains ^en points that are very far away 
from the remaining points that are in convex position (as, for example, in FigureO](a))? 
Or perhaps, we consider such a set P as far from convex position? On the other hand, 
if P contains an e fraction of points which make P non-convex, but after a very small 
perturbation of these points, the obtained set will be in convex position (see, e.g.. Figure^ 
(b)). Do we want to call such a point set £-far from convex position or not? 
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It is clear that the distance notion is very application dependent, and in this paper we 
investigate various distance measures which should be of practical interest, and study 
basic problems from computational geometry for these distance measures. 

We begin with a distance that measures how much the input points are allowed to be 
moved (perturbated) in order to transform them into being in convex position. 

Definition 2.2. (Perturbation measure) A point set P in the d-dimensional unit cube 
is s -far from convex position (according to the Perturbation measure), if for any pertur- 
bation of points in P that moves any point by distance at most e, the resulting point set 
is not in convex position. 

Because of scaling, the fact that P is enclosed by the unit cube is assumed without 
loss of generality. 

We introduce also another measure that although very similar to the Perturbation 
measure, will be more useful for our applications. 

Definition 2.3. (Neighborhood measure) A point set P in the d-dimensional unit cube 
is e -far from convex position (according to the Neighborhood measure), if there exists 
a point p G P for which the d-dimensional ball of radius e with center at p does not 
intersect the boundary of convi^P). 

The next measure is more related to the volume discrepancy of the convex hull of the 
input points. It differs significantly from the Perturbation and Neighborhood measures, 
because this measure is relative to the volume of conv{P). 

If a point set P is in convex position, then all points in P lie on the boundary of 
convex hull conv{P) and therefore conv{P) is also the maximal convex hull defined by 
any (non-trivial) subset of P whose interior contains no point from P. In view of this, 
we may want to consider P to be close to convex position, if a maximum (with respect 
to the volume) convex hull defined by a subset of P that contains no point from P in its 
interioiQ is almost the same as conv{P). If we use the volume measures for these two 
objects, then we get the following definition: 

Definition 2.4. (Volume measure) A point set P in is e-far from convex position 
(according to the Volume measure), if < 1 — e, where vol{X) denotes 

the volume of object X and EmpInt{P) is a maximum volume convex hull defined by 
a subset of P that contains no point from P in its interior. 

Now, we are ready to formally define property testing algorithms. 

Definition 2.5. (e-Testers) An algorithm is called an e-tester for a property Q, if it 
always accepts any input satisfying property Q and with probability at least rejects 
any input that is e-farfrom satisfying property Q. 

Throughout the paper, we say P is e-close to convex position if it is not e-far from 
convex position. 

^ Observe that in general that may be many such maximum convex hulls. 
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Relations between different measures of closeness. How can we relate the four mea- 
sures defined in Definitions IT.IflT^ As we observed above, a point set P can be close to 
convex position according to the Hamming distance, even if it is far (very far!) from con- 
vex position according to the Perturbation, the Neighborhood, and the Volume measures. 
Similarly, the opposite is also true: P may be very close to convex position according 
to the Perturbation, the Neighborhood, and the Volume measures even it it fails for 
the Hamming distance. But how about any relationship between the Perturbation, the 
Neighborhood, and the Volume measures? 

The first lemma shows that the first two measures are somehow equivalent for asymp- 
totic complexity of the testers. It holds for any cost measure of query complexity, because 
exactly the same tester is to be used. 

Lemma 2.1. There is an e-tester for convex position according to the Perturbation 
measure with query complexity T(n, e) if and only if there is an 0{e)-testerfor convex 
position according to the Neighborhood measure with query complexity T(n, 0{e)). □ 

Unfortunately, we were unable to provide a similar relationship between the Neigh- 
borhood and the Volume measures. It seems to us that the latter one is more complicated. 
We can only prove some partial results about similarity of these two measures, for ex- 
ample: 

Lemma 2.2. An e'^-tester for the Volume measure is an 0{e)-tester for the Neighbor- 
hood measure. □ 



3 A New Model Using Geometric Queries 

In the previous works on property testing, the complexity of a tester has been typically 
measured as the number of input “atom objects” inspected, that is, as the number of 
queries to the input. The form of the queries allowed for the algorithm depended on the 
input representation. And so, for example, if an input consists of a set of n points (as 
it is the case for testing if the points are in convex position), then it has been typically 
assumed that one can use queries of the form: “what is the position of the kth point in the 
input.” In the standard query complexity additional computational work is not counted 
(for example, if we know positions of points in 5, S' C P, then the cost of computing a 
convex hull of S is not counted in the query complexity0). Our main observation is that 
this notion of query complexity often does not suit well to study geometric properties, 
or actually, to distance measures different than the Hamming distance. Indeed, if we 
want to check if a point set P is e-close to convex position according to the Perturbation 

^ For example, in 1 2 1, it is shown that the query complexity for testing (according to Hamming 
distance) if a point set in is in convex position is 0{n ‘^ while for d > 4, the 
“running time complexity” (which measures also the time required for all computations used by 
the tester) is O{npolylog{l/e) -\- polylog{n)), and it is quite possible 

that it is optimal. Thus, in the most basic case, for d = 4 and constant ^ , the query complexity 
is while the “running time complexity” is C7(n). This difference vanishes for d = 2, 3, 

because in this case very efficient (almost linear-time) algorithms for testing convex position 
are available. 
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measure, then even a single point might be far away from the remaining points to make 
this property false. Therefore, the algorithm must find this point with probability at least 
|, and this clearly requires 0(n) query complexity. Similar phenomenon holds also for 
the Neighborhood measure and the Volume measure. 

This observation shows that in order to model property testing for geometric proper- 
ties and in order to obtain very efficient (sublinear-time) algorithms one has to reconsider 
and change the notion of query complexity. Unlike in the very standard model, here we 
want to allow more complex queries: those using certain geometric properties of the 
input. 

In most of geometric models (and in applications) even if the input is represented 
by positions of the points (or other geometric objects), very often one maintains some 
additional data structures for efficient and structured access to the input. One of the most 
fundamental abstract data structure maintained by many algorithms working with points 
are data structures for efficient answering range queries (cf. (T||). For the purposes of 
this paper we adopt a model of computation in which the basis operation is a range query 
to the input, and the query complexity is the number of range queries to the input. 

H^ormally, we are given an unknown set P of n points in that is defined by an 
unknown function Pp : R'^ x N — >■ R'^ U {empty} such that Pp{R, i) returns the zth 
point in a query range R (according to some unknown fixed order) or the symbol empty, 
if there are less than i points in the query range R. 

The model defined above uses a very powerful oracle since we are allowed to specify 
an arbitrary range when we query the oracle. To make our consideration of practical value, 
it seems reasonable to require that such an oracle must be efficiently implemented. 
Therefore, in this model we will restrict ourselves to the case that i? is a (possibly 
unbounded) simplex. Most of the results presented in this paper hold even for orthogonal 
range queries. Such queries are supported by many well known data structures such as 
partition trees and cutting trees, as well as practical structures based on quad-trees or 
i?-trees (see, e.g., QJ] for a more detailed discussion). There are efficient data structures 
(see, e.g., O) to support our queries and a single query to such a data structure is usually 
performed very fast (i.e., much faster than processing the whole point set). We believe 
that the use of such range queries is very natural in our context, since many applications 
(such as GISs) use data structures for range queries (e.g., i?-trees) to answer other kinds 
of queries anyway. 

In a similar way, depending on the problem at hand, one could assume that some 
other very basic geometric queries are available; we do not discuss this issue in more 
details however. 

3.1 Property Testing Algorithms for Convex Position in the New Model 

In this section we present our first e-tester for convex position. It works for the Neighbor- 
hood measure, and it shows that the use of geometric queries (orthogonal range queries) 
allows to beat the lower bounds discussed in Section0and obtain the query complexity 
of polylog{l / e) . 

^ We formalize our model of computation only to inputs that are in the form of point sets; for 
other input types the model can be defined accordingly in a similar way. 
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The input for our tester consists of a point set P in the d-dimensional unit cube and 
a real number e, 0 < e < 1, which defines the quality of the tester. 



'Convexity-Test I (P, e): 

S = 0 

partition the unit cube into (2 \/d/e)‘^ sub-cubes of side-length 
for each such sub-cube c do 

if c contains a point from P then add any such a point to S 
if S is in convex position then accept 
V else reject 



2 •/d 



In algorithm Convexity-Test 1 (P, e) the operation of verifying if c contains a 
point from P as well as the operation of returning a point from P (T c is performed using 
orthogonal range queries. 

Theorem 3.1. Let P be a point set in the d- dimensional unit cube and let e be a real 
number, 0 < e < 1. AZgonY/tm Convexity-Test I (P, e) is a property tester that accepts 
P only if P is e-close (according to the Neighborhood measure) to convex position. It 
uses 0{{Vd/e)‘^) orthogonal range queries. □ 

Actually, we can slightly improve the complexity of Convexity-Test I (P, e) and 
design an £-tester that uses only 0{{s/d/ eY~^) orthogonal range queries. 

In the previous sections we discussed testing convexity properties in geometric set- 
ting. Now, we give a tester for testing convexity properties of planar point sets using a 
distance measure that is related to the Hamming distance (in fact, the distance measure 
below is stronger). A tester for Hamming distance is presented in iQ. The main differ- 
ence in our approach here is the use of geometric queries that leads a to substantial speed 
up. 

Definition 3.1. A set P ofn points in the plane is £-far from being in convex position, if 
at least e n points in P are not extreme. (A point is extreme if it belongs to the boundary 
of the convex hull of P.) 



It immediately follows. 

Lemma 3.1. A tester for the distance measure in Definition E3 is also a tester for the 
Hamming distance. □ 



We can also prove the following lemma. 



Lemma 3.2. In the standard property testing model (see, e.g., there is no 

testing algorithm for the distance measure from Definition 13.71 that has o(n) query 
complexity. □ 



We can prove that the use of appropriate data structures for the geometric queries 
allows us to design a tester with logarithmic query complexity. We assume the input 
point set P is in general position. 



Theorem 3.2. There is a tester for convex position in the plane with query complexity 
0(log n/e) that uses only triangular range queries. □ 
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4 Map Labeling 

In this section we consider the following basic map labeling problem: 

Let P be a set of n points in the plane. Decide whether it is possible to place n 
axis-parallel unit squares such that 

all squares are pairwise disjoint (labels do not overlap), 

each point is a corner of exactly one square (each point is labeled), and 

each square has exactly one point on its corners (each point has a unique label). 

If a set 5 of n squares satisfies the conditions above, then S is called a valid labeling 
for P. The map labeling problem is known to be A/^T’-complete and the corresponding 
optimization problem is known to have no approximation algorithm with ratio better 
than 2, unless V = J\fP m. 

In this section we develop a property tester for the map labeling problem. We use 
the following Hamming distance measure: 

Definition 4.1. A set P ofn points in the plane is e-farfrom having a valid labeling, if 
we have to delete at least e n points to obtain a set of points that has a valid labeling. 

When we consider the standard property testing model iniTEr that allows only to 
sample random subsets of P with a uniform distribution we can prove the following 
result. 

Theorem 4.1. For any constant 5, 0 < J < 1, there is a positive constant e such that 

1 1 

there is no e-tester for the labeling problem with oyn j query complexity in the 

standard testing model. □ 

We show now that if we use the computational model that allows/supports geometric 
queries we can design a tester with 0(l/e^) query complexity. It is based on the approach 
developed in Q and [I^. 



LabelTest(P): 



choose a sample set S of size 0{l/e) uniformly at random from P 
for each p € S do 

i = 0,T = ^ 



Let S be the axis parallel square with center p and side length 16 [1 /e] 
while i < (Ififl/e] +2)^ do 



Let q he the i-th point in the query range S 
if 5 / 0 then T =Tu {q} 
if T does not have a valid labeling then reject 
accept 



Theorem 4.2. Algorithm LabelTest is a tester for the labeling problem that has query 
complexity 0{l/e^) and running time exp(C7(l/e^). □ 
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5 Clustering Problems 

In this section we design testing algorithms for three geometric clustering problems. 
The goal of a clustering problem is to decide whether a set of n points P in can be 
partitioned into k subsets (called clusters) Si,. . . , Sk such that the cost of each cluster 
is at most b. There are several different ways to define the cost of a cluster. Let S' be a 
set of points in We consider the following variants: 

Radius Clustering: The cost costfi{S) of a cluster S is twice the minimum radius of 
a ball containing all points of the cluster. 

Discrete Radius Clustering: The cost cosfi 5 fl;(S) of a cluster S is the minimum radius 
of a ball containing all points of the cluster and having its center among the points 
from P. 

Diameter Clustering: The cost costi){S) of a cluster S is the maximum distance be- 
tween a pair of points of the cluster. 

The goal of our property tester is to accept all instances that admit a clustering into 
k subsets of cost b and to reject with high probability those instances that cannot be 
clustered into k subsets of cost (1 + e) 6. 

Definition 5.1. A point set P is (h, k)-clusterable for a cost measure costf), if there is 
a partition of P into sets Si, . . . Sk such that cost{Si) < bfor all 1 < i < k. A point 
set P is e-farfrom being (b, k)-clusterable, if for any partition of P into sets Si, . . . Sk 
at least one set Si has cost larger than ( 1 + e) 6. 

In the standard context of property testing, Hamming distance (that is, a point set is 
£-far from clusterable, if we have to remove at least en points to make it clusterable) has 
been used before |2| . For the diameter clustering problem the distance measure used in |3!| 
has the additional relaxation that a point set is e-far from {b, fc) -clusterable, if one has to 
remove en points to make the set ((1 -he) b, /c)-clusterable. Thus, this definition assumes 
a geometric and a combinatorial relaxation of the corresponding decision problem. We 
require only the geometric relaxation. 

Let us assume, without loss of generality, that 6=1 and thus we want to design 
a tester for the problem whether a point set P is (1, fc) -clusterable for the three cost 
measures above. We partition into grid cells of side length £/(3 Vd). For each cell 
containing an input point, we choose arbitrary input point from the cell as its represen- 
tative. Then, we compute whether the set of representatives is (1, fc)-clusterable. If it is 
so, then we accept it, if it is not so, then we reject it. Clearly, any set of points that is 
(1, fc) -clusterable is accepted by the algorithm. On the other hand, any instance that is 
£-far from (1, fc) -clusterable will be rejected. (This approach has been introduced in ^3 
to obtain a (1 -f £) -approximation algorithm for the radius clustering problem.) 

Our algorithms starts with an empty box with endpoints at infinity. Then we query 
for a point in this box. We allocate the corresponding grid cell and partition the box into 
the sub-boxes induced by the hyperplanes bounding the grid cell. Then we continue 
with one of these sub-boxes. If we find an empty sub-box, it will be marked. If there are 
only marked boxes the algorithm terminates. 

So far, our partition into grid cells works fine, if there are many points in a single 
cell. On the other hand, if no two points are in the same grid cell, the algorithm has I7(n) 
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query complexity. Thus we need an upper bound on the number of grid cells, whose 
representative may form a cluster. 

Lemma 5.1. Let S be a set of points in no two points of which belonging to the same 

cell of a grid of size e/fi sfd) < 1. If cost{S) < I for any of the three cost measures 
described above, then jS”! < (6v/d/e)‘^, where costf) € {costjt, costDfi, costjyj. □ 

Let V = k ■ {6 sfdjeY' the maximum number of cells that can contain points that 
belong to one of the k clusters. We observe that we can stop our procedure if the number 
of representatives is V. Thus, we can guarantee that the algorithm requires at most V ■ 3“* 
range queries. 

Theorem 5.1. There is an e-tester for the radius clustering and diameter clustering 
problem that uses at most k ■ (18 v/ djef^ orthogonal range queries. There is an e-tester 
for the discrete radius clustering problem that uses k ■ (162 sfd/e)’^ orthogonal range 
queries. □ 

5.1 Dynamic Clustering 

In this section we consider the problem of maintaining an approximate clustering of 
points in under the operations insert and delete. Obviously, we can call the decision 
procedure from the previous section 0(log]^_i_£ B) times to find a clustering of size at 
most {l-\-e)B where B is the size of an optimal clustering and B >\. When we combine 
this with a dynamic data structure that supports orthogonal range queries in time A{n) 
(to report a single point in the query range) and update time U (n) we immediately obtain 
the following result. 

Corollary 5.1. We can maintain an (1 + 5s) approximate radius/diameter clustering 
of a point set P in (d constant) under the operations insert and delete in time 
0{U{n) + logi_|_£ B ■ {A{n) ■ k/e'^ + exp{0{k/e‘^))). If the parameters s, d, and k are 
constants this is 0{U (n) + A{n) + log]^_|_£ B) time. □ 

Now, we want to obtain a time bound that is independent of the size of the clustering. 
We shall require an additional kind of oracle access: we allow the tester to query the oracle 
for the number of points within a certain range (this procedure could be also performed in 
our prior model in a logarithmic cost). We also need a procedure to compute a minimum 
(axis parallel) bounding box for the points inside a given cell. This can be easily done 
with 0{d logn) expected oracle accesses. 

To avoid a simple binary search we use this bounding box. The size of the bounding 
box will always be the length of its longest side. Then we compute a clustering C for 
the current grid size (using the representatives for each cell). If I is the size of the largest 
bounding box of all grid cells, then we know that P can be clustered at cost at most 
cost{C) + 2 • sfd ■ 1. Note that we can stop our process if < £/(3 sfd) where s 

is the current size of the grid. 

If we cannot stop we continue with a grid of size I /2. This way the number of grid 
cells with representatives is at most 3"^ times the previous number of grid cells and there 
is at least 1 more such cell. We continue this procedure until we get a lower bound on 
the size of the current clustering. Then we have to do a logarithmic number of further 
steps and we are done. 
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Theorem 5.2. Wfe can maintain an (1 + e) approximate radius/diameter clustering of 
a point set P in (d constant) under the operations insert and delete in time U (n) + 
0{exp{0{V)) ■ {k + log(l/e) + A{n) ■ V ■ logn • (fc + log(l/£r))). Ifk, d, and e are 
constants then this is 0(U{n) + A(n) ■ log n). □ 
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Abstract. Motivated by questions in location planning, we show for a 
set of colored point sites in the plane how to compute the smallest — 
by perimeter or area — axis-parallel rectangle and the narrowest strip 
enclosing at least one site of each color. 



1 Introduction 

We are given a set of n point sites in the plane and k < n colors, each site is 
associated one color. A region of the plane is called color-spanning if it contains 
at least one point of each color. For different kinds of regions we are interested 
in the smallest color-spanning one of that kind. 

The original motivation for our questions comes from location planning. Sup- 
pose there are k types of facilities, e. g. schools, post offices, supermarkets, mod- 
eled by n colored points in the plane, each type by its own color. One basic 
goal in choosing a residence location is in having at least one representative of 
each facility type in the neighborhood, where there are various specifications of 
the term “neighborhood”. A natural question is to ask for the smallest color- 
spanning circle. It can be found using the upper envelope of Voronoi surfaces, as 
described by Huttenlocher et al. El and Sharir and Aggarwal |TT11 Section 8.7]; 
their algorithm for computing the solution runs in time 0{knlogn). Similarly 
one can determine the smallest color-spanning axis-parallel square and other ob- 
jects with fixed orientation which are unit circles of a convex distance function. 

In this paper, we propose to solve more complicated problems of this context: 
we give algorithms to compute the smallest color-spanning axis-parallel rectangle 
in Sect. |2| and the narrowest color-spanning strip in Sect.0 

A couple of related optimization problems for a set S oin points have already 
been studied in the literature, with motivations from statistical clustering or 
pattern recognition. For example, the convex polygon with minimum perimeter 
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German team was supported by DAAD grant 314-AI-e-dr. 
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containing k points of S can be found by using the methods of Dobkin et al. ^ , 
Aggarwal et al. |2|, or finally Eppstein and Erickson 0, the last one in time 
0(n log n -|- k^n). The minimum area convex polygon containing k points of S 
can be determined in time 0{v?'\ogn + min(fc^, n)) combining results of 0 
and Eppstein et al. 0. Similar problems for selecting k points out of n use as 
optimization criterion the diameter or the variance of the k-set, or they ask for 
the smallest circle with respect to a certain metric containing at least k points |3 
this latter problem is of course very closely related to the Voronoi 
diagram of order k. 

Other very natural optimization criteria are the perimeter and the area of the 
axis-parallel rectangle enclosing a fc-point set, these criteria are sometimes briefly 
called the Loo perimeter and Loo area. For computing the smallest perimeter, 
the best known running time is 0{nlogn + k'^n) for algorithms by Datta et al. |2] 
and by Eppstein and Erickson jS|. The algorithm of Aggarwal et al. |2| can be 
used for both variants of the problem, the area and the perimeter, and takes time 
0(min(fc^nlogn,n^)), while Segal and Kedem’s solution fB] for both variants 
runs in 0{n + k{n — fc)^) time and is applicable only for k > n/2. 

Not much seems to be known about lower bounds for these problems. Ma- 
tousek H2| reports that at least some of them are known to be n^-hard, a notion 
introduced by Gajentaan and Overmars [0|; compare also Erickson and Seidel |E|. 

Interestingly, the approach in 0 for the smallest rectangle, like some ap- 
proaches for the smallest circle, is also based on the Voronoi diagram of higher 
order, in this case of order 6k — 6. This is because the optimal A:-point set can be 
shown to be contained in a circle centered at such a Voronoi vertex which passes 
through the corresponding sites. Eppstein and Erickson’s approach jEj uses the 
fact that the members of the optimal fc-point set are always among the 16fc near- 
est rectilinear neighbors of each of them. But neither of these properties based 
on proximity seem to be extensible to our new problem that involves colors. 

For multicolored point sets, there are solutions to several problems, such as 
the hichromatic closest pair, see e. g. Preparata and Shamos mi Section 5.7], 
Agarwal et al. PJ, and Graf and Hinrichs m, the group Steiner tree where, for 
a graph with colored vertices, the objective is to And a minimum-weight subtree 
that covers all colors, see Mitchell m Section 7.1], or the chromatic nearest 
neighbor search, see Mount et al. PS]. 

2 The Smallest Color-Spanning Rectangle 

Among all rectangles whose sides are parallel to the x- and y-axis and which 
contain at least one site of each color we are looking for the smallest one, by 
perimeter or by area. 

Some special cases are immediately solved. For k = 1 the problem is trivial, 
and for k = n, i. e., we have exactly one point for each color, the solution is the 
bounding box of the point set. And also the case k = n — C ior some constant C 
can be solved in time 0{n) because in this case there is only a constant (< 2C) 
number of sites with colors that have more than one site. Finally for fc = 2 in the 
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perimeter case, we can make use of an algorithm that computes the bichromatic 
Li-closest pair in time 0{n log n), see e. g. Graf and Hinrichs HOI- For the general 
case however, new ideas are necessary. 

In Sect. EH we show that the optimal rectangle must fulfill the so-called 
non- shrinkable property, we present a simple algorithm with running time in 
0{n{n — /c)^), and we prove the tight bound of 0{{n — fc)^) for the number of 
non-shrinkable rectangles. In Sect. 12. 2^ we give necessary and sufficient condi- 
tions for non-shrinkable rectangles, and we refine the algorithm to a running 
time of 0{nk{n — k)). Finally, in Sect. 12. we use a result by Overmars and van 
Leeuwen for dynamically maintaining the maximal elements of a point set 
to further improve the running time to a near-optimal 0{n{n — k) log^ k). 



2.1 Non-shrinkable Rectangles and a First Algorithm 

Let px and Py denote the coordinates of a site and Pcoi denote its color. For 
the sake of simplicity of the presentation, we make the following assumption on 
general position. No two x- or y-coordinates are equal, i. e., there is no horizontal 
or vertical line passing through two points. We exclude the trivial cases and 
assume for the remaining part of the paper that 1 < fc < n. It is clear that 
the smallest color-spanning rectangle, by perimeter or by area, must be non- 
shrinkable in the following sense. 

Definition 1. An axis-parallel rectangle is called non-shrinkable if it contains 
sites of all k colors and it does not properly contain another axis-parallel rectangle 
that contains all colors. 

Therefore, each non-shrinkable rectangle must touch a site with each of its 
four edges, such that there are two, three, or four sites on its boundary, among 
them no two of the same color. The colors on its boundary do not appear at 
sites in its interior. 

Our algorithms will systematically find all non-shrinkable rectangles and 
compare their perimeters or areas to determine the smallest one, thereby solving 
the two variants of the problem at the same time. A first and quite simple idea 
to do this is shown next, this is similar to the procedure of 0. 

Algorithm 1 The lower-left corner of a candidate is either determined 
by one site or by a pair of sites of different colors such that these two 
sites lie on the candidate’s bottom and left edges. 

For each such lower-left corner, we proceed as follows. Let U be the set 
of sites which lie above and to the right of the corner. 

1. Initially the top edge of the rectangle starts at infinity. The right 
edge starts at the a:-coordinate of the corner, it is moved right over 
the sites of U until all colors are contained in the actual rectangle. 

2. Then, in decreasing y-order, the top edge steps through the sites of 
the rectangle as long as it still contains all colors; when this stops, 
we have found a candidate. 
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3. The right edge advances until it reaches a site with the color of the 
site at the actual top edge. 

4. As long as the right edge has not stepped over all points of C7, we 
repeat from step El 

It is clear that all non-shrinkable rectangles are checked as candidates by 
Algorithm ^ but also some more rectangles that may contain sites of the same 
color as the left or bottom edges. For each corner the algorithm spends time 
0{n) if the sites are previously sorted by their ^-coordinates and by their y- 
coordinates. 

Remark that for one fixed corner we cannot have more than n — k + 1 can- 
didates because the right edge has stepped over at least k sites in step E Fur- 
thermore, the left edge of a candidate can be only at the first n — k + 1 sites 
in a;-order and the lower edge only at the first n — k + 1 sites in y-order, so we 
obtain a 0(n(n — fc)^) bound for the running time of Algorithm^ 

Before trying to improve on this time bound, we are interested in determining 
the exact number of non-shrinkable rectangles, since in the worst case it seems 
unavoidable to check (nearly) all of them. 

Lemma 1. There are 0{(n — k)'^) non-shrinkable rectangles. 

Proof. We start by proving the upper bound. As we have remarked earlier, each 
edge of a non-shrinkable rectangle N must contain a site of a color that occurs 
only once in N. First consider the case that a site is a corner of the rectangle, 
i. e., the site touches two edges. A site can be the, e.g., lower left corner of a 
non-shrinkable rectangle only if it belongs to the n — k-\-l leftmost sites because 
it must have k — 1 sites to its right. Also, in the analysis of Algorithm ^ we have 
seen that there are at most n — k 1 non-shrinkable rectangles for one fixed 
corner. Thus, the upper bound holds for all non-shrinkable rectangles that have 
at least one site at a corner. In particular, this also settles the cases fc = 2, 3. 

So we can assume that fc > 4 and that each edge contains exactly one site 
in its interior. Let I, b, and r denote the colors of the singular points on the 
left, bottom, and right edge of N, correspondingly. We enlarge N by moving its 
upper edge upwards until one of the following events occurs. Either, the upper 
edge hits a point of color b; then we have obtained a so-called enlarged candidate 
with singular points on its left and on its right edge that contains points of color 
b on its top and bottom edges, but no further 6-colored points (type 1). Or the 
upper edge hits a point of color I or r, say 1; then we have an enlarged candidate 
containing two points of color I on its top and left edges, but no further points of 
color I, and singular points on its right and bottom edges (type 2). If the upper 
edge does not hit a point of color 6, I, or r then we obtain an enlarged candidate 
with upper edge at infinity and singular points on its other three edges (type 3). 

This way we have mapped each non-shrinkable rectangle on an enlarged can- 
didate of type 1, 2, or 3. The mapping is one-to-one; given an enlarged candidate 
of any type we can just lower its top edge until, for the first time, the lowest 
point of some color is hit, and obtain a non-shrinkable rectangle. Thus, it suffices 
to show that there are only 0((n — fc)^) enlarged candidates of each type. 
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In order to bound the number of type 1 rectangles we fix an arbitrary point, p, 
of color pcoi and show that there are at most 0{n — k) type 1 rectangles Rij that 
have p on their bottom edge and another pcorcolored point qi on their top edge 
to the right of p. Indexing is such that qi, . . . ,qm have increasing ^-coordinates. 
Let Tij be the singular point on the right edge of rectangle Rij, for I < j < rrii. 
Clearly, different rectangles Rij with the same index i must have different right 
points Tij. Since none of these rectangles can contain a third point of color Pcoi, 
point qi+i must be below qt and to the right of all points rij,l < j < rrii. 
Trivially, all points 1 < j < ’’rrii+i, are to the right of qi+\. A continuation 

of this argument shows that all points Vij are pairwise different. This proves the 
claim since only n — k + I points can have the right edge, and the same for the 
bottom edge. 

The argument for the type 2 rectangles is quite similar. We fix the point p 
on the left edge and consider all rectangles Rij of type 2 that have a point qi 
of the same color on their top edges. Again, all singular points rij on the right 
edges are pairwise different. 

The unbounded enlarged candidates of type 3 are even easier to count: for a 
fixed point on the bottom edge there can be only n — k + 1 of them, since for each 
possible left edge there is at most one right edge, if any, and only n — k + 1 sites 
can have the left edge. (By the way, k is also an upper bound on the number of 
rectangles of this type for a fixed bottom edge, as will follow from Lemma 0) 

It remains to show the lower bound, i. e., we are given numbers n and k, and 
we want to place n sites with k colors such that there are I7((n — fc)^) different 
non-shrinkable rectangles. To this end, we make a construction as sketched in 

FigJU 

We construct three groups of sites. The first group consists of [(n — fc)/2j -|- 1 
sites of color 1 and is placed on the line y = —1 — x at positions with negative 
X- and ^-coordinates, the second group has \{n — k) /2] -|- 1 sites but of color 2 
and is placed on the line y = 1 — x at positions with positive coordinates, and 
the third group contains one site of each of the other k — 2 colors and is placed 
very close to the origin. 

Now each rectangle spanned by a site of color 1 as the lower left corner and 
by a site of color 2 as the upper right corner contains all colors and is one of 
f2{{n — fc)^) non-shrinkable rectangles. □ 

2.2 An Improved Approach 

The question arises if the proof method for the 0{{n — k)^) upper bound can be 
used for efficiently constructing all non-shrinkable rectangles. In fact, we are able 
to enumerate all enlarged candidates of types 1, 2, and 3 within time 0(n^ log k). 
The difficulty is in efficiently moving down the upper edges of these rectangles, 
in order to obtain non-shrinkable rectangles. This can be done within the same 
time bound for the types 2 and 3, but seems quite hard to do for the type 1 
rectangles. 
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Fig. 1. Each rectangle spanned by a site of color 1 and a site of color 2 is non-shrinkable. 



Therefore, we resort to a more direct method that is a refinement of Al- 
gorithm n Instead of fixing the lower left corner, let us try to fix the upper 
and lower edges, i. e., for each pair of sites a and b with ay < by we check all 
non-shrinkable rectangles with lower j/-coordinate ay and upper y-coordinate by. 

We consider conditions that must be fulfilled by such a non-shrinkable rect- 
angle with left edge at I and right edge at r, I and r may coincide with a or b. 
First, it is clear that a and b must be contained in the rectangle. Second, the 
interior of the rectangle must not contain sites of the colors of a and b. Third, 
the colors of I and r are not contained in the interior either. 

More formally, for a given color c we define the following numbers. 

Lc{a, b) = max { Px \ ay < Py < by and Px < ax } 

P&S, Pcol=C 



Rc{a, b) = min {px\ay <py <by and Px> ax) 

peS, Pcoi=c 

In other words, L^{a, b) is the maximum x-coordinate of all sites of color c in the 
horizontal strip between a and b and to the left of ax, and Rc{a, b) the analogous 
minimum to the right of ax', they take on the values of — oo resp. -boo if no such 
site exists. 
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Now the first condition above means that 

Za; < min(aa;, and > max(aa;, ( 1 ) 

The second condition can be expressed as 
Za; > max 6 )^ and < min (a, 6 ), (a, 5)^ 

( 2 ) 

In other words, we have an x-interval for the possible positions of the left edge 
of a non-shrinkable rectangle from Uy to by, and another one for the right edge. 
The third condition transforms to 

lx = Li^oi (a> b) ^il T^a,b and (a, b) if r ^ a, b, (3) 

i. e., the site I on the left edge, if it is not a or 6 itself, is the x-maximal site of 
its color in the horizontal strip between a and b and to the left of min(aa,, &a,), 
and correspondingly with r. Therefore the following assertion holds. 

Lemma 2. Let a and b be two sites of S. Independently ofn, there are at most 
k — 1 non-shrinkable rectangles with lower edge at a and upper edge at b. 

Proof. According to OSJ, the left edge of such a non-shrinkable rectangle has only 
k — 2 possible positions if its color is different from a^oi and bcoi, and min(aa,, 63 ,) 
as one additional possibility. □ 

For fixed a, the quantities Lc{a, b) and Rc{a, b) can easily be updated if b 
steps through the sites in j/-order. For each b it remains to match the correct 
pairs of sites at the left and at the right edges, this is done by the following 
algorithm. 

Algorithm 2 The sites are sorted in y-order. For each site a we do the 
following to find all candidate rectangles with lower y-coordinate Uy. 

1. Let L and R be arrays over all colors, initialized to —00 resp. -l-oo; 
they will contain the values Lc{a, b) and Rc{a, b) for the actual a and 
b and for all colors c. 

The lists Sort! and SortR will contain all sites that actually con- 
tribute an entry to L resp. R, sorted in x-direction. 

2. For all sites b with by > Oy in y-order we do. Perform steps to 1^ 
only if bcoi ^ acoi, in any case perform step I2dl 

a) /ncZL := min (oj,, 63,); ExclL := m.ax{La^^,, 

InclR := max(a 3 ,, 63 ,); ExclR := min(i?ae„i , 

In list Sort! we mark the sites with x-coordinates greater than 
ExclL and smaller than InclL, and correspondingly with SortR 
from InclR to ExclR. 

b) The left edge starts at the first marked element of SortL. The 
right edge starts at InclR and if necessary steps over the marked 
sites in SortR until all colors are contained in the actual rectan- 
gle. 
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c) As long as the marked parts of SortL and SortR are not ex- 
hausted, we repeat the following steps. 

i. The left edge advances over the marked sites in SortL and 
finally InclL as long as the rectangle contains all colors; when 
this stops, we have found a candidate. 

ii. The right edge advances over the marked sites in SortR until 
it reaches the color of the site at the actual left edge. 

d) If bx < ax 

then := max(Lb^^j , also unmark and update SortL 
else Rb^^i ■= , t»cc), also unmark and update SortR. 



Lemma 3. The candidates reported by Algorithm\^ are precisely all non-shrink- 
able rectangles. Lts running time is in 0{nk{n — k)). 



Proof. Let us consider what happens if steps E3 to I2dl are executed for certain 
sites a and b. 

Steol2dl has been performed for all previous values of b, so L and R contain 
the correct Lc{a,b) and Rc{a,b) for all colors. Remark that this also holds for 
bcoi because the update of L and R concerning b is done at the end of the loop. 

SortL and SortR contain the sites corresponding to the values of L resp. R, 
and only these values are possible left resp. right edges of the rectangle, as we 
have seen earlier. The marked parts correspond to the intervals {ExclL, LnclL) 
resp. {LnclR, ExclR) and reflect conditions ([5 and 0; sites a or 6 can also be at 
the left or right edges of a non-shrinkable rectangle, this is taken into account 
by starting the right edge at InclR in step I 2 EI and finishing with the left edge at 
InclL in step P(c) 



For each possible left edge the matching right edge, if any, is found in 
steps 2(c)i and 2(c)ii these steps are very similar to steps Q to 0 of Algorithm [D 
The case in which there is no non-shrinkable rectangle for a at the bottom 
edge and b at the top is quickly detected: If some colors are missing in the 
horizontal strip between a and b then sten l2hl alreadv does not succeed. If another 
site of color Ocoi or bcoi is contained in the rectangle spanned by a and b then 
ExclL > LnclL or LnclR > ExclR and one of the lists is not marked at all. Finally, 
the case bcoi = acoi is explicitly excluded in stepQ 



The running time can be estimated as follows. Site a at the bottom edge 
needs to be iterated only over the first n — k-h 1 sites in y-order, so this factor is 
contributed. Factor n is for the loop over all b (not n—k because the updates in 
stepl^need to be executed for all b above a). Finally, the repetition of steps |2(c)i| 
and |2(c)n| results in a factor k, as well as the (un)marking and the updates of 
the sorted lists. □ 



For small and in particular for constant k Algorithm 0 is the method of 
choice, because it is very simple and can be implemented using just a few lines 
of code. On the other hand, for large k the 0{n{n — k)'^) method of Algorithm 0 
is preferable. In the general case however, there is still room for improvements. 
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2.3 Maximal Elements 

Definition 2. A maximal element of a set, T, of points in the plane is a point 
p € T such that there is no q G T with p,^ > Qx o,nd Py < Qy. 

Remark that for our special purpose we have slightly deviated from the usual 
definition, see we are interested in maximal elements in the upper left (in- 
stead of right) direction, see FigEI Our maximal elements are those that are not 
dominated from left and above by another point of the set. 



Fig. 2. Maximal elements of a point set in the upper left direction. 



Now consider, for given a and b, the values of Lc{a,b) and Rc{a,b). We 
transform these values to points in 2D, using L for the ^-coordinates and R for 
the y-coordinates. 

Te(a, b) = (^Lc{a, b), Rc{a, 5)^ for all colors c acoub^oi 

Some of the coordinates of these points may be ±oo. With T{a,b) we denote 
the set of all points T(,{a,b). The next lemma shows that maximal elements are 
closely related to spanning colors. 

Lemma 4. Assume that the horizontal strip between a and b contains all colors. 
The point Tc{a,b) for some color c is a maximal element of T{a,b) if and only 
if the rectangle with a and b at the bottom and top edges and with Lc{a, b) as left 
and Rc(a,b) as right edge contains all colors with the possible exception ofbcoi- 

Proof. Let Tc{a,b) be a maximal element of T(a,b). Suppose there is a color, 
c', which is not contained in the rectangle between a, b, Lc{a,b), and Rc(a,b). 
Then Lc'{a,b) < Lc{a,b) and Rc>{a,b) > Rc{a,b), and Tc’ dominates Tc{a,b), a 
contradiction. Conversely, if all colors are contained in the rectangle then Tc(a, b) 
must be a maximal element because it can’t be dominated by any other color. 

□ 

Now we have an interesting relation between non-shrinkable rectangles and 
maximal elements. 
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Lemma 5. For a non- shrinkable rectangle with sites a, b, I, r at the bottom, 
top, left, and right edges with I a, b and r ^ a, b, (a, b) and Tr,,„i (a, b) are 
two successive maximal elements of the set of points in T{a,b). 

Proof. Assume that Ti^^,{a,b) is dominated by some Tc{a,b). Clearly we have 
Lc{a,b) < Li^^^{a,b) — l^, but also Rc{a,b) > Ri^^^{a,b) > holds because Icoi 
cannot appear a second time in the rectangle. This means that color c is not 
contained in the rectangle, a contradiction, and analogously for (a, 6). 

Now assume some T(,(a,b) is maximal element between the two maxi- 
mal elements Tj.^^i{a,b) and Ti^^^(a,b). Then we have Lr„^^(a,b) < Lc(a,b) < 
Li^^^{a,b) = Za; and = Rr„^^{a,b) < Rc{a,b) < Ri^^,{a,b), and again c is not 
contained in the rectangle. □ 

And the converse is also true, in some sense. 

Lemma 6. Consider two sites a and b and two colors c, c' ^ OcoZ, bcoi such that 
Tc{a,b) and Tc'{a,b) are two successive maximal elements ofT{a,b) and assume 
that the horizontal strip between a and b contains all colors. Then the rectangle 
between a, b, I with l^ = Lc'{a,b) as left edge, and r with r^ = Rc{a,b) as right 
edge is non- shrinkable if additionally conditions m and 0) hold. 

Proof. From Lemma 0 we know that the rectangle between a, b, Lc{a,b) and 
Rc{a, b) contains all colors. Now let the left edge move right until the rectangle 
is non-shrinkable. The left edge must now be situated at Lc'{a, b), otherwise there 
would be another maximal element between Tc{a,b) and Tc'{a, b). Conditions (Q 
and m are necessary to guarantee that no other sites of color Ucoi or bcoi are 
contained in the rectangle. □ 



Theorem 1. Given n sites and k colors, the smallest color-spanning rectangle 
can be found in time 0{n{n — k) log^ k). 

Proof. We modify Algorithm^ The main difference is in maintaining a dynamic 
tree MaxElem of maximal elements instead of the lists SortL and SortR. 

In step EHI now MaxElem is updated if the value of or has changed; 
this can be done in time 0(log^ k) using the method of Overmars and van 
Leeuwen nni. 

The marking of the lists is replaced in the following way. The values ExclL, 
InclL, InclR, and ExclR are computed as before. Then the subsequence of ele- 
ments in MaxElem is extracted that is included in {ExclL, InclL) in x-direction 
as well as in {InclR, ExclR) in y-direction. This can be done in time 0(log k) plus 
the length of this subsequence which in turn is essentially the same as the num- 
ber of non-shrinkable rectangles reported. It remains to report the matchings 
between left and right edges as described in Lemma El 

So the running time of this method is 0{n{n — k) log^ k) plus the total num- 
ber of reported non-shrinkable rectangles but which is fortunately bounded by 

0((n — Zc)^), see Lemma El □ 
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3 The Narrowest Color-Spanning Strip 

The narrowest color-spanning strip problem asks for two parallel lines in the 
plane that contain all colors in between them such that their distance is mini- 
mized. 

Notice that the solution strip must have three sites of three different colors 
on its boundary, because if they were only two or they had a coincident color, 
the strip could be shrunk by rotation. 

A brute force approach could work as follows. Consider the O(n^) lines de- 
fined by two sites of different colors, and sort them by slopes in 0{'n? log n) time. 
Start with one of them and project all the other sites, following the direction 
of the line, onto any perpendicular line. By sorting the projected points and 
walking along them we can find the solution in that direction in O(nlogn) time. 
Now, at each change of direction, we only need to update the order of the pro- 
jected points in 0(1) time and explore the points, again walking along them, in 
0(n) time to find the solution for the new direction. Hence, the algorithm works 
in 0{n^) time. 

When the direction changes, the cluster of points that gives the optimal 
solution may completely change. This is the reason why we don’t envisage a 
more clever updating. Using techniques of inversion and outer envelopes we 
obtain a much better algorithm, as the following theorem states. 

Theorem 2. Given n sites and k eolors, the narrowest eolor-spanning strip ean 
be found in 0{n‘^a{k)logk) time. 

The proof is omitted here because of lack of space. 



4 Conclusions 

We have solved two optimization problems by giving algorithms that are likely 
to be close to optimal. The narrowest color-spanning strip problem can be solved 
in time 0(n^a(fc) log fc), and it is n^-hard in the sense of |^, this proof is not 
very difficult but omitted here for brevity. 

On the other hand, the smallest color-spanning rectangle problem can be 
solved in time 0{n{n — k) log^ k) while we have the tight quadratic bound for 
the number of non-shrinkable rectangles. It would be interesting to have a formal 
proof (or a refutation) that this number is also a lower bound for this problem. 

We must admit though that we do not have a lower bound better than 
I2(nlogn) for the problem; this one at least can be obtained in several ways. For 
example, the problem of finding the dichromatic closest pair can be transformed 
to our problem with fc = 2, or the maximum gap problem m can be transformed 
to it with k = nj2. Of course this is not really satifying, because it seems that 
a quadratic lower bound or a n^-hardness result is possible. 

The smallest color-spanning rectangle with an arbitrary orientation would 
be the next natural generalization. 
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Abstract. We look at time-space tradeoffs for the static membership 
problem in the bit-probe model. The problem is to represent a set of 
size up to n from a universe of size m using a small number of bits so 
that given an element of the universe, its membership in the set can be 
determined with as few bit probes to the representation as possible. 

We show several deterministic upper bounds for the case when the num- 
ber of bit probes, is small, by explicit constructions, culminating in one 
that uses o(m) bits of space where membership can be determined with 
[Ig Ig n] -1-2 adaptive bit probes. We also show two tight lower bounds 
on space for a restricted two probe adaptive scheme. 



1 Introduction 

We look at the static membership problem: Given a subset S of up to n keys 
drawn from a universe of size m, store it so that queries of the form “Is x in S'?” 
can be answered quickly. We study this problem in the bit-probe model where 
space is counted as the number of bits used to store the data structure and time 
as the number of bits of the data structure looked at in answering a query. 

A simple characteristic bit vector gives a solution to the problem using m 
bits of space in which membership queries can be answered using one bit probe. 
On the other hand, the structures given by Fredman et al.^, Brodnik and 
Munro P and Pagh 0 can be used to get a scheme that uses O(nlgm) bits of 
space in which membership queries can be answered using O(lgm) bit probes. 
Recently Pagh |S| has given a structure that requires 0(sm,n) bits of space 
and supports membership queries using 0(lg(m/n)) bit probes to the structure, 
where Sm,n = 0(nlg(m/n)) is the information theoretic lower bound on space 
for any structure storing an n element subset of an m element universe. 

Buhrman et al.p| have shown that both the above schemes are optimal. In 
particular they have shown that any deterministic scheme that answers mem- 
bership queries using one bit probe requires at least in bits of space and any 
deterministic scheme using 0(sm,n) bits of space requires at least I7(lg(m/n)) 
probes to answer membership queries. They have considered the intermediate 
ranges and have given some upper and lower bounds for randomized as well as 
deterministic versions. Their main result is that the optimal O(nlgm) bits (for 
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n < and one bit probe per query are sufficient, if the query algorithm 

is allowed to make errors (both sided) with a small probability. For the determin- 
istic case, however, they have given some non-constructive upper bounds. They 
have also given some explicit structures for the case when t is large (t > Ig n) . 

Our main contribution in this paper, is some improved deterministic upper 
bounds for the problem using explicit constructions, particularly for small values 
of t. For sets of size at most 2, we give a scheme that uses 0{rn^/^) bits of space 
and answers queries using 2 probes. This improves the bit scheme in 

0 shown using probabilistic arguments. We also show that the space bound is 
optimal for a restricted two probe scheme. We then generalize this to a |"lg Ig n] -I- 
2 probe scheme for storing sets of size at most n, which uses o{m) bits of space. 
This is the best known constructive scheme (in terms of the number of bit probes 
used) for general n that uses o(m) bits of space, though it is known |3 (using 
probabilistic arguments) that there exists a scheme using o(m) bits of space 
where queries can be answered using a constant number of bit probes. 

The next section introduces some definitions. The following section gives 
improved upper bounds for deterministic schemes. In section 4, we give some 
space lower bounds for a restricted class of two probe schemes, matching our 
upper bound. Finally, Section 5 concludes with some remarks and open problems. 



2 Definitions 

We reproduce the definition of a storing scheme, introduced in 0. An (n, to, s)- 
storing scheme, is a method for representing any subset of size at most n over a 
universe of size m as an s-bit string. Formally, an (n, to, s)-storing scheme is a 
map 4> from the subsets of size at most n of {1, 2, ... , to} to {0, 1}'*. A determinis- 
tic (to, s, t)-query scheme is a family of to boolean decision trees {Ti, T 2 , . . . , Tin}, 
of depth at most t. Each internal node in a decision tree is marked with an index 
between 1 and s, indicating an address of a bit in an s-bit data structure. All 
the edges are labeled by “0” or “1” indicating the bit stored in the parent node. 
The leaf nodes are marked “Yes” or “No”. Each tree Ti induces a map from 
{0,1}® — >■ {Yes, No}. An (n, to, s)-storing scheme and an (to, s, t)-query scheme 
Ti together form an (n, to, s, t)-scheme which solves the (n, TO)-membership prob- 
lem if \/S,x s.t. \S\ < n,x € U : Tx{(p{S)) = Yes if and only if a; £ S'. A non- 
adaptive query scheme is a deterministic scheme where in each decision tree, all 
nodes on a particular level are marked with the same index. 

We follow the convention that whenever the universe {!,... ,to} is divided 
into blocks of size b (or m/b blocks), the elements {(j — 1)6-1- 1, . . . ,ib} from 
the universe belong to the ith block, for 1 < j < [to/ 6J and the remaining 
(at most 6) elements belong to the last block. For integers x and a we define, 
div{x, a) = \_x/a\ and mod{x, a) = x — a div{x, a). To simplify the notation, we 
ignore integer rounding ups and downs at some places where they do not affect 
the asymptotic analysis. 
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3 Upper Bounds for Deterministic Schemes 

As observed in the static dictionary structure given by Fredman, Komlos 
and Szemeredi P] can be modified to give an adaptive (n, m, s, t)-scheme with 
s = 0{nkm^^^) and t = 0{lgn + Iglgm) + k, for any parameter fc > 1. This 
gives a scheme when the number of probes is larger than Ign. In this section, 
we look at schemes which require fewer number of probes albeit requiring more 
space. 

For two element sets, Buhrman et al.[2| have given a non-adaptive scheme 
that uses 0{^Jm) bits of space and answers queries using 3 probes. If the query 
scheme is adaptive, there is even a simpler structure. Our starting point is a 
generalization of this scheme for larger n. 

Theorem 1. There is an explicit adaptive {n,m, s,t) -scheme with t = |"lg(n + 
1)] + 1 and s = (n + |"lg(n + 

Proof. The structure consists of two parts. We divide the universe into blocks 
of size The first part consists of a table T of size each entry cor- 

responding to a block. We call a block non-empty if at least one element from 
the given set falls into that block and empty otherwise. For each non-empty 
block, we store its rank (the number of non-empty blocks appearing before and 
including it) in the table entry of that block and store a string of zeroes for each 
empty block. Since the rank can be any number in the range [1, . . . , n] (and we 
store a zero for the empty blocks), we need |’lg(n-|- 1)] bits for storing each entry 
of the table T. 

In the second part, we store the bit vectors corresponding to each non-empty 
block in the order in which they appear in the first part. For convenience, we call 
the jth bit vector as table Tj. Thus the total space required for the structure is 
at most (n -|- |"lg(n -I- 1)] )m^/^ bits. 

Every element x G [to] is associated with l-\-l locations, where I is the number 
of non-empty blocks: t{x) = div{x,m^^^) in table T and tj{x) = mod{x,iv}^'^) 
in table Tj for 1 < j < L Given an element x, the query scheme first reads the 
entry j at location t{x) in table T. If j = 0, the scheme answers ‘No’. Otherwise 
it looks at the bit Tj{tj{x)) in the second part and answers ‘Yes’ if and only if 
it is a one. □ 

If only two probes are allowed, Buhrman et al.0 have shown that, any non- 
adaptive scheme must use to bits of space. For sets of size at most 2, they have 
also proved the existence of an adaptive scheme using 2 probes and bits 

of space. We improve it to the following: 

Theorem 2. There is an explicit adaptive scheme that stores sets of size at 
most 2 from a universe of size m using bits and answers queries using 

2 bit-probes. 

Proof. Divide the universe into blocks of size to^^^ each. There are to^/^ blocks. 
Group consecutive blocks into a superblock. There are superblocks 
of size TO^/^ each. 
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The storage scheme consists of three tables T, Tq and Ti, each of size 
bits. Each element x G [to] is associated with three locations, t{x), to(a^) and 
ti{x), one in each of the three tables, as defined below. Let b = and 

bi = m}!^. Then, t(x) = div{x,b\), to{x) = mod{x,b) and ti{x) = div{x,b) bi + 
mod{x,bi). Given an element x € [to], the query scheme first looks at T{t{x)). 
If T{t{x)) = j, it looks at Tj{tj{x)) and answers ‘Yes’ if and only if it is 1, for 

To represent a set {x, y}, if both the elements belong to the same superblock 
(i.e. if div{x,b) = div{y,b)), then we set the bits T{t{x)) and T{t{y)) to 0, all 
other bits in T to 1; To{to{x)) and To{to{y)) to 1 and all other bits in Tq and 
Ti to 0. In other words, we represent the characteristic vector of the superblock 
containing both the elements, in Tq, in this case. 

Otherwise, if both the elements belong to different superblocks, we set 
T{t{x)), T{t{y)), Ti{ti{x)) and Ti{ti{y)) to 1 and all other bits in T, Tq and 
Ti to 0. In this case, each superblock has at most one non-empty block contain- 
ing one element. So in T\, for each superblock, we store the characteristic vector 
of the only non-empty block in it (if it exists) or any one block in it (which is a 
sequence of zeroes) otherwise. One can easily verify that the storage scheme is 
valid and that the query scheme answers membership queries correctly. □ 

One can immediately generalize this scheme for larger n to prove the fol- 
lowing. Notice that the number of probes is slightly smaller than that used in 
Theorem ^ though the space used is larger. 

Theorem 3. There is an explicit adaptive {n,m, s,t)-scheme with t = 1 -|- 
[lg([n/2j -I- 2)] and s = 0(TO^/^(n/2 -|- lg(n/2 -I- 2) -|- 1)). 

Proof Sketch: The idea is to distinguish superblocks containing at least 2 
elements from those containing at most one element. 

In the first level, if a superblock contains at least 2 elements, we store its 
rank among all superblocks containing at least 2 elements, with all its blocks. 
Since there can be at most [n/2j superblocks containing at least 2 elements, the 
rank can be any number in the range {1, . . . , [n/2j }. For blocks which fall into 
superblocks containing at most one element, we store the number [n/2j -|- 1, if 
the block is non-empty and a sequence of ]"lg([n/2j -|- 2)] zeroes, otherwise. 

The second level consists of [n/2j -|- 1 bit vectors of size to^/^ each. We will 
store the characteristic vector of the jth super block in the jth bit vector for 
1 < j < Z, where I is the number of superblocks containing at least 2 elements. 
We will store all zeroes in the bit vectors numbered / -|- 1 to [n/2j. In the 
([n/2j -I- I)st bit vector, for each superblock we store the characteristic vector 
of the only non-empty block in it, if it has exactly one non-empty block or a 
sequence of zeroes otherwise. 

On query x, we look at the first level entry of the block corresponding to 
X. We answer that the element is not present, if the entry is a sequence of 
zeroes. Otherwise, if it is a number k in the range [1, . . . , [n/2j], we look at the 
corresponding location of x in the kth bit vector in the second level (which stores 
the bit vector corresponding to the superblock containing x). Otherwise (if the 
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number is [n/2j + 1), we look at the corresponding location of x in the last bit 
vector and answer accordingly. □ 

This can be further generalized as follows. In the first level, we will distinguish 
the superblocks having at least k elements (for some integer k) from those with 
at most k — 1 elements in them. For superblocks having at least k elements, we 
store the rank of that superblock among all such superblocks, in all the blocks 
of that superblock. For the other superblocks, we store the rank of the block 
among all non-empty blocks in that superblock, if the block is non-empty and 
a sequence of zeroes otherwise. The second level will have [n/k\ + k — 1 bit 
vectors of length each where in the first [n/k\ bit vectors, we store the 

characteristic vectors of the at most [n/k\ super blocks containing at least k 
elements in them (in the order of increasing rank) and pad the rest of them 
with zeroes. Each of the {[n/k\ -I- j)th bit vectors, for 1 < j < A: — 1, stores 
one block for every superblock. This block is the jth non-empty block in that 
superblock, if that superblock contains at least j non-empty blocks and at most 
k — 1 elements; we store a sequence of zeroes otherwise. The query scheme is 
straightforward. This results in the following. 

Corollary 1. There is an explicit adaptive {n,m, s,t)-scheme with t = 1 + 
|"lg([n/fcj -I- k)~\ and s = 0{m‘^^^{njk + lg{n/k + k) + k)). 

Choosing k = |"\/^, we get an explicit adaptive (n, m, s, t) -scheme with 
t = 2 + Ig n] and s = 

Actually, by choosing the block sizes to be — — — and the sizes of the 
superblocks to be — — — we get the following improved scheme: 

Corollary 2. There is an explicit adaptive {n,m, s,t)-scheme with t = 2 + 
[|lgn] and s = 0(m^/^(nlgn)^/^). 

We generalize this to the following: 

Theorem 4. There is an explicit adaptive {n,m, s,t)-scheme with t = [IgA:] -I- 
[^ Ign] -1-1 and s = (ig A: -I- ]:\gn + kn^^^), for k > 1. 

Proof. We divide the universe into blocks of size b (to be determined later) and 
construct a complete 6-ary tree with these blocks at the leaves. Let the height 
of this tree be k. Thus, we have m = or b = Given a set S of n 

elements from the universe, we store it using a three level structure. We call a 
block non-empty if at least one element of the given set S belongs to that block 
and call it empty otherwise. We define the height of a node in the tree to be the 
length of the path (the number of nodes in the path) from that node to any leaf 
in the subtree rooted at that node. Note that the height of the root is A: -I- 1 and 
that of any leaf is one. 

In the first level we store an index in the range [0, . . . , A; — 1] corresponding 
to each block. Thus the first level consists of a table B of size 6^ where each 
entry is a [IgA:] bit number. The index stored for an empty block is 0. For a 
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non-empty block, we store the height < fc — 1 of its ancestor (excluding the 
root and the first level nodes of the tree) x of maximum height such that the 
total number of elements falling into all the blocks in the subtree rooted at node 
X is more than . This will be a number in the range [0, . . . , fc — 1]. 

In the second level we store a number in the range [1, . . . , — 1] cor- 

responding to each block. Thus this level consists of a table T of size b^, each 
entry of which is a [ign^/^] bit number. The number stored for an empty block 
is 0. For a non-empty block, we store the following: 

Observe that given any node x at height h which has at most ele- 

ments from the set, the number of its children which have more than 
elements from the set is less than . Suppose the index stored for a block is 

1. It means that the ancestor x of that block at height I has more than el- 
ements and the ancestor y at height 1 + 1 has at most elements. Hence 

y can have less than children which have more than elements. Call 

these the ‘large’ children. With all the leaves rooted at each large child of y, we 
store the rank of that child among all large children (from left to right) in the 
second level. 

In the third level, we have k tables, each of size m/b bits. The tth 

table stores the representations of all blocks whose first level entry (in table B) 
is i. We think of the ith table as a set of bit vectors, each of length m/b. 

Each of these bit vectors in the ith level stores the characteristic vector of a 
particular child for each node at height i of the tree, in the left to right order. 
For each block (of size b) with first level entry i and second level entry j, we 
store the characteristic vector of that block in the jth bit vector of the fth table 
at the location corresponding to its block of size We store zeroes (i.e. the 
characteristic vector of an empty block of appropriate size) at all other locations 
not specified above. 

Every element x £ [m] is associated with k + 2 locations b{x), t{x) and 
ti{x) for 0 < i < fc — 1, as defined below: b{x) = t{x) = div{x,b), ti{x) = 
mod{div{x, -I- mod{x, B), b^). 

Given an element a;, the query scheme first reads i = B{b{x)) and j = T(t{x)) 
from the first two levels of the structure. If j = 0, it answers ‘No’. Otherwise, 
it reads the jth bit in the table entry at location ti{x) in table Ti and answers 
‘Yes’ if and only if it is 1. 

The space required for the structure is s = &^(|"lgfc] -I- [^Ign] -I- 

[n^/^J) bits. Substituting b = makes the space complexity to be 

m'=/('=+i)([lgfcl + [^Ign] -|- kn^^^). The number of probes required to answer a 
query is t = [Ig A:] -I- r]: Ig n] -|- 1. □ 



One can slightly improve the space complexity of the above structure by 
choosing non-uniform block sizes and making the block sizes (branching factors 
at each level, in the above tree structure) to be a function of n. More precisely, 
by choosing the branching factor of all the nodes at level i in the above tree 

f [Ig fc] -I- [ ^ Ig n] ^ ^ 

I [ni/fc] 



structure to be bi, where bi = m} *=+1 



we get 
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Corollary 3. There is an explicit adaptive {n,m, s,t)-scheme with t = [Igfc] + 

Ign] +1 and s = {k + (^(Rg k > 1. 

By setting A: = Ig n, we get 

Corollary 4. There is an explicit adaptive {n,m, s,t)-scheme with t = [Iglgn] 
+2 and s = o{m) when n is 

In the above adaptive scheme we first read [Ig k~\ + Ig n] bits from the 
structure, and depending on these bits we look at one more bit in the next level 
to determine whether the query element is present. An obvious way to make this 
scheme non-adaptive is to read the |"lg fc] + Ig n] bits and all possible k 
bits (in the next level) and determine the membership accordingly. Thus we get 
an explicit non-adaptive (n, m, s, t)-scheme with t = [IgA:] -I- [^Ign] -|- A:[n^/^] 
and s = By setting k = [Ign] in this, we get a non-adaptive scheme 

with t = O(lgn) and s = o{m). 

These schemes give the best known explicit adaptive and non-adaptive 
schemes respectively for general n using o(m) bits. 



4 Lower Bounds 

Buhrman et al.|2j have shown that for any (n, m, s, t) scheme s is fl{ntm^^^). One 
can achieve this bound easily for n = 1. They have also shown that for n >2 any 
two probe non-adaptive scheme must use at least m bits of space. In this section, 
we show a space lower bound of bits for a restricted class of adaptive 

schemes using two probes, for n > 2. Combining this with the upper bound of 
Theorem|2 this gives a tight lower bound for this class of restricted schemes. We 
conjecture that the lower bound applies even for unrestricted schemes. We also 
show a lower bound of Q{m) bits for this restricted class of schemes for n > 3. 

Any two-probe 0(s) bit adaptive scheme to represent sets of size at most 2 
from a universe U of size m, can be assumed to satisfy the following conditions 
(without loss of generality): 

1. It has three tables A, B and C each of size s bits. 

2. Each X £ U is associated with three locations a{x), b{x) and c{x). 

3. On query x, the query scheme first looks at A{a{x)). If A(a(a:)) = 0 then it 
answers ‘Yes’ if and only if B{h{x)) = 1 else if A{a{x)) = 1 then it answers 
‘Yes’ if and only if C{c{x)) = 1. 

4. Let Ai = {x £ [m] : a(x) = i}, Bi = {b{x) : x £ Ai} and Ci = {c{x) : x £ 
Ai} for 1 < * < s. For all 1 < i < s, \Bi\ = \Ai\ or \Ai\ = \Ci\. I.e. the set of 
elements looking at a particular location in table A will all look at a distinct 
locations in one of the tables, B and C. (Otherwise, let x,y,x',y' £ Aj, 
X ^ y and x' yf y' be such that b{x) = b{y) and c{x') = c{y'). Then we can 
not represent the set {x,x'}.) 
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5. Each location of A, B and C is looked at by at least two elements of the 
universe, unless s > m. (If a location is looked at by only one element, then 
set that location to 1 or 0 depending on whether the corresponding element 
is present or not; we can remove that location and the element out of our 
scheme.) 

6. There are at most two ones in B and C put together. 

Define the following restrictions: 

— Rl. For x,y € [m],x ^ y, a(x) = a(y) => b(x) ^ b{y) and c{x) ^ c{y). 

— R2. For i,j € [s], f yf j, Bi D Bj ^ (f> ^ Ci D Cj = (j). 

— R3. Either B or C is all zeroes. 

We show that if an adaptive (2, m, s, 2) scheme satisfies R3 (or equivalently 
Rl and R2, as we will show), then s is I7(m^/^). Note that the scheme given in 
Theorem 0 satisfies all these three conditions. We then show that if an adaptive 
(n, m, s, 2) scheme for n > 3 satisfies R3, then s > m. 

Theorem 5. If an adaptive (2,m, s,2) scheme satisfies condition i?3, then s is 
I2(m^/^). 

Proof. We first show that (R3 ^ Rl and R2) and then show that (Rl and R2 
=> s is 17 (to^/^)). 

Let a(x) = a(y) and b(x) = b(y) for x,y G [m],x y. Consider an element 
z ^ X such that c{x) = c{z) (such an element exists by condition 5 above). Now, 
the set {y,z} cannot be represented satisfying R3. Thus we have, R3 Rl. 

Again, let a(cci) = 0 (^ 2 ) = i, a{yi) = 0 ( 1 / 2 ) = j, b{xi) = b{yi) and c{x 2 ) = 
0 ( 2 / 2 ) (so that R2 is violated). Then, the set { 2 : 2 , 2 / 1 } cannot be represented 
satisfying R3. Thus we have, R3 => R2. 

Observe that Rl implies 



\Ai\ = \Bi\ = \Ci\,'iiA<i<s. (1) 

Hence 

S S 

^|R.| = ^|A,| = m. (2) 

i=l i=l 

By R2, the sets Bi x Ct are disjoint (no pair occurs in two of these Carte- 
sian products). Thus, by Equation (P), X)i=i Cauchy-Schwarz, 

s(J2i=i\Bi\/sf < J2i=i\Bi\'^ < s^. By Equation ||2I), J2i\Bi\ = m. Thus, 
/s < s^ or s > □ 

Remark: We observe that, in fact the condition R3 is equivalent to Rl and R2. 
To show this, it is enough to prove that Rl and R2 R3. We argue that any 
scheme that satisfies Rl and R2 can be converted into a scheme that satisfies 
R3 also. 
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Consider any scheme which satisfies R1 and R2 but not R3. So, there exists 
a set {x, y} such that a{x) ^ a{y) for which the scheme stores this set as follows 
(without loss of generality): A(a(a;)) = 0, A{a{y)) = 1, B{b{x)) = 1, C{c{y)) = 1, 
A{a{z)) = 1 for all z for which b(z) = b{x), A{a{z)) = 0 for all z for which 
c(z) = c{y) and all other locations as zeroes. 

Let a{x) = i and a{y) = j. If Bi 11 Bj = cj> then we can store this set as 
follows: A{a{x)) = A(a(y)) = 0,B(b(x)) = B(b(y)) = 1 and all other entries in 
A as Is, and all entries in B and C as zeroes, satisfying R3. Condition R1 (and 
the fact that B^ fl Bj = (p) ensures that this is a valid scheme to represent the 
set {x,y}. 

If Bi n Bj ^ (j), then R2 ensures that Ci C\ Cj = (p. In this case, to store the 
set {x,y} we can set A{a{x)) = A{a{y)) = l,C{c{x)) = C{c{y)) = 1 and all 
other entries as zeroes, satisfying R3. 

We now show the following. 

Theorem 6. If an adaptive (n,m, s,2) scheme, for n > 3 satisfies condition 
R3, then s > m. 

Proof. We first observe that any two probe adaptive scheme satisfies conditions 
1 to 5 of the adaptive schemes for sets of size at most 2. Consider an adaptive 
(3, m, s, 2) scheme with s < m. One can find five elements x, y, y' , z and z' from 
the universe such that a{y) = a{y'), a(z) = a(z'), b(x) = b(y) and c(x) = c(z). 
(Start by fixing x, y, z and then fix x' and y' .) Existence of such a situation is 
guaranteed by condition 5, as s < m. Then we can not represent the set {a;, y' , z'} 
satisfying R3, contradicting the assumption. Hence, s > m. □ 

5 Conclusions 

We have given several deterministic explicit schemes for the membership problem 
in the bit probe model for small values of t. Our main goal is to achieve o(m) 
bit space and answer queries using as few probes as possible. We could achieve 
[Iglgn] + 2 adaptive probes through an explicit scheme, though it is known 
(probabilistically) that one can get a o(m) bit structure which uses only 5 probes 
to answer queries. It is a challenging open problem to come up with explicit 
scheme achieving this bound. We conjecture that one can not get a three probe 
o(m) bit structure. 

One can also fix some space bound and ask for the least number of probes 
required to answer the queries. For example, if s = 0{n^/m), Theorem Ogives a 
lg(n+ 1) + 1 probe adaptive scheme. It would be interesting to see if this can be 
improved. Also this scheme immediately gives an n + 0(lgn) probe non-adaptive 
scheme, with the same space bound. Demaine et al.0 have improved this to an 
0{^/n\gn) probe non-adaptive scheme with s = 0{\/mn\gn). 
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Abstract. Bloom filtering is an important technique for space efficient 
storage of a conservative approximation of a set S. The set stored may 
have up to some specified number of “false positive” members, but all 
elements of S are included. In this paper we consider lossy dictionaries 
that are also allowed to have “false negatives”. The aim is to maximize 
the weight of included keys within a given space constraint. This relax- 
ation allows a very fast and simple data structure making almost optimal 
use of memory. Being more time efficient than Bloom filters, we believe 
our data structure to be well suited for replacing Bloom filters in some 
applications. Also, the fact that our data structure supports informa- 
tion associated to keys paves the way for new uses, as illustrated by an 
application in lossy image compression. 



1 Introduction 

Dictionaries are part of many algorithms and data structures. A dictionary pro- 
vides access to information indexed by a set S of keys'. Given a key, it returns 
the associated information or reports that the key is not in the set. In this paper 
we will not be concerned with updates, i.e., we consider the static dictionary 
problem. The main parameters of interest are of course the space used by the 
dictionary and the time for looking up information. We will assume keys as well 
as the information associated with keys to have a fixed size. 

A large literature has grown around the problem of constructing efficient dic- 
tionaries, and theoretically satisfying solutions have been found. Often a slightly 
easier problem has been considered, namely the membership problem, which is 
the dictionary problem without associated information. It is usually easy to de- 
rive a dictionary from a solution to the membership problem, using extra space 
corresponding to the associated information. In this paper we are particularly 
interested in dictionary and membership schemes using little memory. Let n 
denote the size of the key set S. It has been shown that when keys are w-bit 
machine words, lookups can be performed in constant time in a membership 
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data structure occupying B + o{B) bits of memory, where B = log ) is the 
minimum amount of memory needed to be able to represent any subset of size n 
g] (loga rithms in this paper are base 2) . However, constant factors in the lower 
order term and lookup time make this and similar schemes less than one could 
hope for from an applied point of view. Also, difficulty of implementation is an 
obstacle to practical use. In total, current schemes with asymptotically optimal 
space usage appear to be mainly of theoretical interest. 

If one relaxes the requirements to the membership data structure, allowing 
it to store a slightly different key set than intended, new possibilities arise. 
A technique finding many applications in practice is Bloom filtering P^. This 
technique allows space-efficient storage of a superset S" of the key set S, such 
that S' \ S is no more than an e fraction of {0, 1}“. For n <C 2*“, about log(l/e) 
bits per key in S are necessary and sufficient for this P). This is a significant 
savings compared to a membership data structure using B Ri nlog(^^) bits. 
Lookup of a key using Bloom filtering requires 0(log(l/e)) memory accesses and 
is thus relatively slow compared to other hashing schemes when e is small. Bloom 
filtering has applications in, for example, cooperative caching and differential 
files, where one wants no more than a small chance that an expensive operation 
is performed in vain. Bloom filtering differs from most other hashing techniques 
in that is does not yield a solution to the dictionary problem. 

1.1 This Paper 

In this paper we introduce the concept of lossy dictionaries that can have not 
only false positives (like Bloom filters), but also false negatives. That is, some 
keys in S (with associated information) are thrown away when constructing the 
dictionary. For false positives there is no guarantee on the associated information 
returned. We let each key in S have a weight, and try to maximize the sum of 
weights of keys in the dictionary under a given space constraint. This problem, 
with no false positives allowed, arises naturally in lossy image compression, and 
is potentially interesting for caching applications. Also, a dictionary with two- 
sided errors could take the place of Bloom filters in cooperative caching. 

We study this problem on a unit cost RAM, in the case where keys are 
machine words of w bits. We examine a very simple and efficient data structure 
from a theoretical as well as an experimental point of view. Experimentally 
we find that our data structure has surprisingly good behavior with respect 
to keeping the keys of largest weight. The experimental results are partially 
explained by our theoretical considerations, under strong assumptions on the 
hash functions involved. Specifically, we assume in our RAM model that for 
a number of random functions, arbitrary function values can be returned in 
constant time by an oracle. 

1.2 Related Work 

Most previous work related to static dictionaries has considered the membership 
problem on a unit cost RAM with word size w. The first membership data 
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structure with worst case constant lookup time using 0{n) words of space was 
constructed by Fredman et al. U|. For constant i5 > 0, the space usage is 0{B) 
when 2“ > but in general the data structure may use space f2{Bw). The 
space usage has been lowered to B + o{B) bits by Brodnik and Munro 0. The 
lower order term was subsequently improved to o(n) + 0(log w) bits by the first 
author m The main concept used in the latter paper is that of a quotient 
function g of a hash function h, defined simply to be a function such that the 
mapping k H> {h{k),q{k)) is injective. 

The membership problem with false positives was first considered by 
Bloom p. Apart from Bloom filtering the paper presents a less space efficient 
data structure that is readily turned into a lossy dictionary with only false pos- 
itives. However, the space usage of the derived lossy dictionary is not optimal. 
Carter et al. @| provided a lower bound of nlog(l/e) bits on the space needed to 
solve membership with an e fraction false positives, for n <C 2™, and gave data 
structures with various lookup times matching or nearly matching this bound. 
Though none of the membership data structures have constant lookup time, such 
a data structure follows by plugging the abovementioned results on space opti- 
mal membership data structures fSn] into a general reduction provided in | 3 |. 
In fact, the dictionary of HH can be easily modified to a lossy dictionary with 
false positives, thus also supporting associated information, using 0{n + log w) 
bits more than the lower bound. 

Another relaxation of the membership problem was recently considered by 
Buhrman et al. 0. They store the set S exactly, but allow the lookup procedure 
to use randomization and to have some probability of error. For two-sided error 
e they show that there exists a data structure of Oinuije^') bits in which lookups 
can be done using just one bit probe. To do the same without false negatives it is 
shown that 0{n^w j e^) bits suffice and that this is essentially optimal. Schemes 
using more bit probes and less space are also investigated. If one fixes the random 
bits of the lookup procedure appropriately, the result is a lossy dictionary with 
error e. However, it is not clear how to fix the parameters in a reasonable model 
of computation. 



2 Lossy Dictionaries 

Consider a set S containing keys fci,...,fc„ with associated information 

01. . . . , a„ and positive weights Ui, . . . , Suppose we are given an upper bound 
m on available space and an error parameter e. The lossy dictionary problem 
for e = 0 is to store a subset of the keys in S and corresponding associated 
information in a data structure of m bits, trying to optimize the sum of weights 
of included keys, referred to as the value. For general e we allow the dictionary 
to contain also 2’"e keys from the complement of S. In this section we show the 
following theorem. 

Theorem 1. Let a sequence of keys fci, . . . , G {0, 1}“', associated information 

01. . . . , a„ G {0, 1}*, and weights vi >■■■> Vn > 0 be given. Let r > 0 be an 
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even integer, and b > 0 an integer. Suppose we have oracle access to random 
functions : {0,1}“ — >■ |l,...,r/2} and corresponding quotient functions 

9 i ,<?2 : {0,1}“ — >■ {0,1}® \ 0®. There is a lossy dictionary with the following 
properties: 



1. The space usage is r{s — h + l) bits (two tables with r/2 cells of s — b+l bits). 

2. The fraction of false positives is bounded by e < (2^ — l)r/2“. 

3. The expected weight of the keys in the set stored is Ym=i Pr,i where 



l-52r-V(l{?-l), 

2 (1 - 2/r)*-i - (1 - 2/r)20-i), 



for i < r/2 
for i > rf2 



is the probability that ki is included in the set. 

4-. Lookups are done using at most two (independent) accesses to the tables. 
5. It can be constructed in time Oljilog* n + rl/w). 



As discussed in Sect. 12. II there exist quotient functions for s = w— log r +0(1) 
if the hash functions map approximately the same number of elements to 
each value in {1 ,...,t’/ 2}. The inequality in item 0 is satisfied for b = 
[log(2“e/r + 1)J, so for s = in — logr + 0(1) an e fraction of false positives 
can be achieved using space r (log( ) + I + 0(1)). As can be seen from 
item 0 almost all of the keys {ki, . . . , fc^/ 2 } are expected to be included in the 
set represented by the lossy dictionary. For i > r(2 our bound on pi^r is shown 
in Fig. lUof Sect.|3 together with experimentally observed probabilities. If n > r 
and r is large enough it can be shown by integration that, in the expected sense, 
more than 70% of the keys from {fci, . . . , kj.} are included in the set (our exper- 
iments indicate 84%). We show in Sect. t^. 51 that the amount of space we use to 
achieve this is within a small constant factor of optimal. 

Note that by setting 6 = 0 we obtain a lossy dictionary with no false positives. 
Another point is that given a desired maximum space usage m and false positive 
fraction e, the largest possible size r of the tables can be chosen efficiently. 



2.1 Preliminaries 

The starting point for the design of our data structure is a static dictionary 
recently described in H2|- In this dictionary, two hash tables T\ and T 2 are used 
together with two hash functions hi, /12 : {0, 1}“ — >■ {1, . . . , r/2}, where r denotes 
the combined size of the hash tables, assumed to be even. A key x G S is stored 
in either cell hi{x) of Ti or cell h 2 {x) of T 2 . It was shown that if r > (2 + (5) n, for 
J > 0, and hi, /12 are random functions, there exists a way of arranging the keys 
in the tables according to the hash functions with probability at least 1 — |^. 
For small 6 this gives a dictionary utilizing about 50% of the hash table cells. 
The arrangement of keys was shown to be computable in expected linear time. 

Another central concept is that of quotient functions. Recall that a quo- 
tient function g of a hash function h is a function such that the mapping 
k I— >■ {h{k),q{k)) is injective When storing a key k in cell h{k) of a hash 
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table, it is sufficient to store q{k) to uniquely identify k among all other ele- 
ments hashing to h{k). To mark empty cells one needs a bit string not mapped 
to by the quotient function, e.g., 0® for the quotient functions of Theorem Q 
The idea of using quotient functions is, of course, that storing q{k) may require 
fewer bits than storing k itself. If a fraction 0{l/r) of all possible keys hashes to 
each of r hash table cells, there is a quotient function whose function values can 
be stored in w — logr -1-0(1) bits. This approach was used in to construct 
a dictionary using space close to the information theoretical minimum. As an 
example, we look at a hash function family from |S] mapping from {0,1}“’ to 
{0,1}*. It contains functions of the form ha{k) = {ak mod 2“) div 2““* for a 
odd and 0 < o < 2“. Letting bit masks and shifts replace modulo and division 
these hash functions can be evaluated very efficiently. A corresponding family of 
quotient functions is given by qa{k) = {ak mod 2“) mod 2““*, whose function 
values can be stored in w — log r bits. 



2.2 Our Data Structure 

The idea behind our lossy dictionary, compared to the static dictionary of H2] 
described above, is to try to fill the hash tables almost completely, working with 
key sets of size similar to or larger than r. Each key has two hash table cells 
to which it can be matched. Thus, given a pair of hash functions, the problem 
of finding a maximum value subset of S that can be arranged into the hash 
tables is a maximum weight matching problem that can be solved in polynomial 
time, see e.g. 0. In Sect. I'Z.'.M we will present an algorithm that finds such an 
optimal solution in time 0(n log* n), exploiting structural properties. The term 
0{rl/w) in the time bound of Theorem ^is the time needed to copy associated 
information to the tables. Assume for now that we know which keys are to be 
represented in which hash table cells. 

For 6 = 0 we simply store quotient function values in nonempty hash table 
cells and 0® in empty hash table cells, using s bits per cell. For general b we store 
only the first s — b bits. Observe that no more than 2^ keys with the same hash 
function value can share the first s — b bits of the quotient function value. This 
means that there are at most 2^ — 1 false positives for each nonempty cell. Since 
0® is not in the range, this is also true for empty cells. In addition to the s — b 
bits, we use I bits per cell to store associated information. 

We now proceed to fill in the remaining details on itemsE|and0of TheoremE 



2.3 Construction Algorithm 

First note that it may be assumed without loss of generality that weights are 
distinct. If there are consecutive equal weights Vj = ■ ■ ■ = Vk, we can imagine 
making them distinct by adding positive integer multiples of some quantity 6 
much smaller than the difference between any pair of achievable values. For 
sufficiently small S, the relative ordering of the values of any two solutions with 
distinct values will not change. 
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When weights are distinct, the set of keys in an optimal solution is unique, 
as shown in the following lemma: 

Lemma 2. Suppose that weights are distinct. For 1 < i < n, an optimal solution 
includes key ki if and only if there is an optimal solution for the set {ki , . . . , ki_i} 
such that cell hi{ki) ofT\ or cell h2{ki) 0/T2 is empty. In particular, the set of 
keys included in an optimal solution is unique. 

Proof. We proceed by induction on n. For n = 1 the claim is obvious. In gen- 
eral, the claim follows for i < n by using the induction hypothesis on the set 
{ki, ... ,ki}. For i = n consider the unique set K of keys included in an opti- 
mal solution for {ki, ... , k„-i}. If there is an arrangement of K not occupying 
both cell hi{kn) of Ti and cell /i2(^n) of T2, we may add the key to the 
arrangement which must yield an optimal arrangement. On the other hand, if 
all arrangements of K occupy both cell /ii(fc„) of Ti and cell h2(fc„) of T2, there 
is no way of including without discarding a key in K. However, this would 
yield a lower value and hence cannot be optimal. □ 

Given hash functions hi and /i2 and a key set K, we define the bipartite graph 
G{K) with vertex set {1,2} x {!,... ,r/ 2}, corresponding in a natural way to 
hash table cells, and the multiset of edges {{(1, /ii(fc)), (2, /i2(fc))| | k G K}, 
corresponding to keys. Note that there may be parallel edges if several keys 
have the same pair of hash function values. We will use the terms keys/edges 
and cells/ vertices synonymously. Define a connected component of G{K) to be 
saturated if the number of edges is greater than or equal to the number of vertices, 
i.e., if it is not a tree. We have the following characterization of the key set of 
the optimal solution: 

Lemma 3. Suppose that weights are distinct. The optimal solution includes key 
ki if and only if at least one of (l,hi{ki)) and (2,h2{ki)) is in a non-saturated 
connected component 0/ G({fci, . . . , 

Proof. By Lemma 0 it is enough to show that the keys included in an optimal 
solution for a set of keys K can be arranged such that cell z is empty if and 
only if the connected component of z in G{K) is not saturated. Consider the 
key set K' in an optimal solution for K. For every subset H C K' it must 
hold that \hi{H) \ + \h2{H)\ > \H\ since otherwise not all keys could have been 
placed. Thus, a connected component with key set G is saturated if and only if 
\hi{G C\ K') \ + \h2{C C\ K') \ = \G C\K'\. In particular, when arranging the keys 
oi GC\K' , where G is the set of keys of a saturated component, every cell in the 
connected component must be occupied. On the other hand, suppose there is no 
arrangement of K' such that z, say cell number i of Ti, is empty. Hall’s theorem 
says that there must be a set iJ C K' such that |/ii(iL)\|i}| -|- |/i2(iF)| < \H\. In 
fact, as no other connected components are affected by blocking z, we may chose 

as a subset of the keys in the connected component of z. But then the graph 
of edges in H must contain a cycle, meaning that the connected component is 
saturated. The case of z being in T2 is symmetrical. □ 
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The lemma implies that the following algorithm finds the key set 5" of the 
optimal solution, given keys sorted according to decreasing weight. 

1. Initialize a union-find data structure for the cells of the hash tables. 

2. For each equivalence class, set a “saturated” flag to false. 

3. For i = 1, . . . , n: 

a) Find the equivalence class Cf, of cell hbih) in Tt, for b= 1,2. 

b) If Cl or C 2 is not saturated: 

i. Include ki in the solution. 

ii. Join Cl and C 2 to form an equivalence class c. 

iii. Set the saturated flag of c if ci = C 2 or if the flag is set for ci or C 2 . 

In the loop, equivalence classes correspond to the connected components of the 
graph G({fci, . . . , ki-i}). There is a simple implementation of a union-find data 
structure for which operations take 0(log* n) amortized time; see jl 6j which 
actually gives an even better time bound. 

What remains is arranging the optimal key set S' in the tables. Consider a 
vertex in G{S') of degree one. It is clear that there must be an arrangement such 
that the corresponding cell contains the key of the incident edge. Thus, one can 
iteratively handle edges incident to vertices of degree one and delete them. As we 
remove the same number of edges and vertices from each connected component, 
the remaining graph consists of connected components with no more edges than 
vertices and no vertices of degree one, i.e., cycles. The arrangement of edges in a 
cycle follows as soon as one key has been put (arbitrarily) into one of the tables. 
The above steps are easily implemented to run in linear time. This establishes 
item 1^ of Theorem QJ 

2.4 Quality of Solution 

We now turn to the problem of estimating the quality of the solution. Again 
we will use the fact that weights can be assumed to be unique. A consequence 
of Lemma 0 is that the set of keys in an optimal solution does not depend on 
the actual weights, but only on the sequence of hash function values. Thus, the 
expected value of the optimal solution is Pr,i where Pr^i is the probability 
that the ith key is included in the optimal set of keys. 

Lemma|3says that if {fci, . . . , can be accommodated under the given hash 
functions, they are included in an optimal solution. Using the earlier mentioned 
result of 221 on {ki , . . . , ki} with 6 = ^ — 1, we have that for i < r/2 this 
happens with probability at least 1 — 52r“^/(^ — 1). In particular, pr^i is at 
least this big. 

For i > r /2 we derive a lower bound on as follows. If one of the vertices 
(I, hi{ki)) and (2, /i 2 (^i)) in G{{ki , . . . , h-i}) is isolated, it follows by Lemma0 
that ki is in the optimal solution. Under the assumption that hash function values 
are truly random, G({fci, . . . , ki-i}) has i—1 randomly and independently chosen 
edges. Thus, we have the bound Pr^i > 1 — (1 — (1 — 2/r)*“^))^ = 2(1 — 2jrY~^ — 
(1 — Ri 26“®/” — This concludes the proof of Theorem0 
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2.5 A Lower Bound 



This section gives a lower bound on the amount of memory needed by a lossy 
dictionary with an e fraction of false positives and yn false negatives. Our proof 
technique is similar to that used for the lower bound in ^ for the case 7 = 0. 

Proposition 4. For 0 < e < 1/2 and 0 < 7 < 1, a lossy dictionary representing 
a set S C {0, 1}“’ of n < 2““^ keys with at most 2™e false positives and at most 
'yn false negatives must use space at least 

(1 - 7) nlog - 0{n) bits. 

Proof. We can assume without loss of generality that ■jn is integer. Consider 
the set of all data structures used for the various subsets of n elements from 
{0, 1}™. Any of these data structures must represent a set of at most 2’"e + n 
keys, in order to meet the requirement on the number of false positives. Thus, the 
number of n-element sets having up to yn keys not in the set represented by a 
given data structure is at most C n-i^) Ci ) ^ '^Cn-jn) (771) ■ represent 

all ) key sets one therefore needs space at least 



log 




-log 



/2^€ + n\ 
V(l-7)nj 





log n 



/ (2^e + n)e \ 
V (l-y)n ) 



(l-j)n 




= n log 



(l-y)/e \ 
e + n/2™y 



yn log 



/ (l-e)(l-7) \ 

y(e + n/2-) J 



logn . 



The argument is concluded by first using 7nlog(l/y) = 0(n), then merging the 
two first terms, and finally using (1 — y)nlog(l — 7) = 0(n). □ 

In the discussion following Theorem ^ we noted that if there are quotient 
functions with optimal range, the space usage of our scheme is nlog{ ) + 

0{n) when tables of combined size n are used. The expected fraction 7 of false 
negatives is less than 3/10 by Theorem [I] This means that our data structure 
uses within 0(n) bits of 10/7 times the lower bound. The experiments described 
in Sect. 0 indicate that the true factor is less than 6/5. 



2.6 Using More Tables 

We now briefly look at a generalization of the two table scheme to schemes with 
more tables. Unfortunately the algorithm described in Sect. 12.31 does not seem 
to generalize to more than two tables. An optimal solution can again be found 
using maximum weight matching, but the time complexity of this solution is 
not attractive. Instead we can use a variant of the cuckoo scheme described by 
the authors in cni, attempting to insert keys in order k\, . . . ,kn. For two tables 
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an insertion attempt for ki works as follows: We store ki in cell hi{ki) of T\ 
pushing the previous occupant, if any, away and thus making it nestless. If cell 
hi(ki) was free we are done. Otherwise we insert the new nestless element in 
T 2 , possibly pushing out another element. This continues until we either find a 
free cell or loop around unable to find a free cell, in which case ki is discarded. 
It follows from m and the analysis in Sect. that this algorithm finds an 
optimal solution, though not as efficiently as the algorithm given in Sect. 12.21 
When using three or more tables it is not obvious in which of the tables one 
should attempt placing the “nestless” key. One heuristic that works well is to 
simply pick one of the two possible tables at random. It is interesting to compare 
this heuristic to a random walk on an expander graph, which will provably cross 
any large subset of the vertices with high probability. 

The main drawback of using three tables is, of course, that another memory 
probe is needed for lookups. Furthermore, as the range of the hash functions must 
be smaller than when using two tables, the smallest possible range of quotient 
functions is larger, so more space may be needed for each cell. 



3 Experiments 

An important performance parameter of our lossy dictionaries is the ability to 
store many keys with high weight. Our first experiment tests this ability for lossy 
dictionaries using two and three tables. For comparison, we also test the simple 
one table scheme that stores in each cell the key of greatest weight hashing to 
it. The tests were done using truly random hash function values, obtained from 
a high quality collection of random bits freely available on the Internet Hill- 
Figure Q shows experimentally determined values of Pr,ar, the probability that 
key with index i = ar \s stored in the dictionary, determined from 10^ trials. 
For the experiments with one and two tables we used table size r = 2048 while 
for the experiment with three tables we used r = 1536. We also tried various 
other table sizes, but the graphs were almost indistinguishable from the ones 
shown. From the figure we see the significant improvement of moving from one 
to more tables. As predicted, nearly all of the r/2 heaviest keys are stored when 
using two tables. For three tables this number increases to about .88r. Of the r 
heaviest keys, about 84% are stored when using two tables, and 95% are stored 
when using three tables. 

Apart from asymptotically vanishing differences around the point where the 
curves start falling from 1, the graphs of Fig. Qseem independent of r. For two 
tables the observed value of Pr,ar for a > 1/2 is approximately 3. 5/9.6“. 



3.1 Application 

To examine the practicality of our dictionaries we turn to the real world exam- 
ple of lossy image compression using wavelets. Today most state-of-the-art image 
coders, such as JPEG2000, are based on wavelets. The wavelet transform has the 
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One Table 




Fig. 1. The observed probability that the element with (Qr)th highest weight is stored. 
For two tables our lower bound is shown. 

ability to efficiently approximate nonlinear and nonstationary signals with coef- 
ficients whose magnitudes, in sorted order, decay rapidly towards zero. This is 
illustrated in Fig. El The figure shows the sorted magnitudes of the wavelet coeffi- 
cients for the Lena image, a standard benchmark in image processing, computed 
using Daubechies second order wavelets. Thresholding the wavelet coefficients 
by a small threshold, i.e., setting small valued coefficients to zero, introduces 
only a small mean squared error (MSE) while leading to a sparse representation 
that can be exploited for compression purposes. The main idea of most wavelet 
based compression schemes is to keep the value and position of the r coefficients 
of largest magnitude. To this end many advanced schemes, such as zero-tree 
coding, have been developed. None of these schemes support access to a single 
pixel without decoding significant portions of the image. 

Recently, interest in fast random access to decoded data, accessing only a few 
wavelet coefficients, has arisen [?SI9II4II,^ . In we show that lossy dictionaries 
are well suited for this purpose. Based on our data structure for lossy dictio- 
naries, we present a new approach to lossy storage of the coefficients of wavelet 
transformed data. The approach supports fast random access to individual data 
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Fig. 2. Largest 5000 magnitudes of 67615 wavelet coefficients of the Lena image. 



elements within the compressed representation. Compared to the previously best 
methods in the literature our lossy dictionary based scheme performs about 
50%-80% better in terms of compression ratio, while reducing the random ac- 
cess time by more than 60%. A detailed description of the method is outside 
the scope of this paper. Instead we use the Lena image to give a flavor of the 
usefulness of lossy dictionaries on real world data. We store the coefficients of 
Fig.H in a two table lossy dictionary of total table size r = 2^^, using a simple 
family of hash functions. Specifically, we use hash functions of the form 

h{k) = ((o 2 fc^ -I- ai 02 A: -|- Oq) mod p) mod r/2, 

where p is a prime larger than any key, 0 < ao,ai ,02 < p and ai is even. A 
corresponding quotient function is 

q{k) = 2 {{{a 2 k^ + a\a 2 k -I- Gq) mod p) div r/2) -|- k mod 2 . 

Again, 10“^ iterations were made, selecting random functions from the above 
family using C’s rand function. The graph of Pr,ar is indistinguishable from that 
in Fig. n For our application, we obtain an MSE of 200, which is 27% more than 
the MSE when storing the r coefficients of largest magnitude. This difference 
would be difficult at best to detect in the reconstructed image. The previously 
mentioned family of ^ had somewhat worse performance. Using three tables 
reduces the MSE increase to a mere 1%. 



4 Conclusion 

We have introduced the concept of lossy dictionaries and presented a simple 
and efficient data structure implementing a lossy dictionary. Our data structure 
combines very efficient lookups and near-optimal space utilization, and thus 
seems a promising alternative to previously known data structures when a small 
percentage of false negatives is tolerable. 

Though simple and efficient hash functions seem to work well in practice 
with our data structure, the challenge of finding such families that provably 
work well remains. Furthermore, the last two graphs in Fig.^are not completely 
understood. The same is true for the insertion heuristic for three or more tables. 
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1 Introduction 

Computing the Delaunay triangulation of n points is well known to have an 
C(nlogn) lower bound. Researchers have attempted to break that bound in 
special cases where additional information is known. 

The Delaunay triangulation of the vertices of a convex polygon is such a 
case where the lower bound of f2{n log n) does not hold. This problem has been 
solved in linear time with a deterministic algorithm of Agarwal, Guibas, Saxe 
and Shor p. Chew has also proposed a very simple randomized algorithm jH] 
for the same problem, which we sketch in Sect 12. 2L These two algorithms can 
also compute the skeleton of a convex polygon in linear time and support the 
deletion of a point from a Delaunay triangulation in time linear in its degree. 

Another result is that if a spanning subgraph of maximal degree d of the 
Delaunay triangulation is known, then the remaining part of the Delaunay tri- 
angulation can be computed in 0{ndlog* n) expected randomized time 1 1 4j . The 
Euclidean minimum spanning tree is an example of such a graph of bounded de- 
gree 6. This 0(n log* n) result applies also if the points are the vertices of a chain 
monotone in both x and y directions but, in this special case, linear complexity 
has been achieved by Djidjev and Lingas m generalizing the result of Agarwal 
et al. for convex polygons. 

Beside these results, where knowing some information on the points helps to 
construct the Delaunay triangulation, it has been proven that knowing the order 
of the points along any one given direction does not help uni. 

Breaking a lower bound by using some additional information arises similarly 
in some other problems. One of the most famous is the triangulation of a simple 
polygon in linear time leiTOO . Other related problems are the constrained De- 
launay triangulation of a simple polygon in 0(n) time 1171 : the medial axis of a 
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tion program. The Spanish team was partially supported by CUR Gen. Cat. 
1999SGR00356, Proyecto DGES-MEC PB98-0933 and Accidn Integrada Erancia- 
Espana HF99-112. 
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simple polygon in linear time |1 IM : the computation of one cell in the intersec- 
tion of two polygons in 0(nlog*^n) time |I2|; the L°° Delaunay triangulation 
of points sorted along x and y axes in O(nloglogn) time jHI- Also, given the 3D 
convex hull of a set of blue points and the convex hull of the set of red points, 
the convex hull of all points can be computed in linear time 0- 

The problem we address in this paper is the following: given the Delaunay 
triangulation T>T{S) of a point set S' in if a partition of S into Si, S 2 , can we 
compute both VT{Si) in o(n log n) time? 

The reverse problem, given a partition of S into Si, S 2 , reconstruct T>T{S) 
from T>T{Si), is known to be doable in linear time jZj. Indeed, the 3D convex 
hull of the vertices of two convex polyhedra can be computed in linear time [2 
and, by standard transformation of the Delaunay triangulation to the convex 
hull, we get the result. This reverse operation can be used as the merging step 
of a divide and conquer algorithm. 

In this paper, we propose an 0{n) randomized algorithm in the spirit of 
Chew’s algorithm for the Delaunay triangulation of a convex polygon. 

2 Preliminaries 

We assume in the sequel that a triangulation allows constant time access from 
a triangle to its three neighbors and to its three vertices, and from a vertex 
to one incident triangle. This is provided by any reasonable representation of a 
triangulation, either based on triangles ^ or as in the DCEL or winged-edge 
structure 0 pp. 31-33]. 

2.1 Classical Randomized Incremental Constructions 

Randomized incremental constructions have been widely used for geometric 
problems PU3! and specifically for the Delaunay triangulation iamii!. These 
algorithms insert the points one by one in a random order in some data structure 
to locate the new point and update the triangulation. The location step has an 
O(logn) expected complexity. The update step has constant expected complex- 
ity as can be easily proved by backwards analysis m- Indeed, the update cost 
of inserting the last point in the triangulation is its degree in the final triangu- 
lation. Since the last point is chosen randomly, its insertion cost is the average 
degree of a planar graph, which is less than 6. 

2.2 Chew’s Algorithm for the Delaunay Triangulation of a Convex 
Polygon 

Chew’s algorithm jSj for the Delaunay triangulation of a convex polygon uses the 
ideas above for the analysis of the insertion of the last point. The main idea is to 
avoid the location cost using the additional information of the convex polygon. 

As noticed earlier, for any vertex v we know one of its incident triangles. 
In the case of Chew’s algorithm, it is required that the triangle in question be 
incident to the convex hull edge following v in counterclockwise order. 
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The algorithm can be stated as follows: 

1. Choose a random vertex p of the polygon V. 

2. Store the point q before p on the convex hull. 

3. Compute the convex polygon V \ {p}. 

4. Compute recursively VT{S \ {p}). 

5. Let t be the triangle pointed to by q. 

6. Create a triangle neighbor of t with p as vertex, flip diagonals if necessary 
using the standard Delaunay criterion and update links from vertices to 
incident triangles. 

By standard backwards analysis, the flipping step has expected constant cost. 
Other operations, except the recursive call, require constant time. Thus we get 
a linear expected complexity. 

The important thing is that we avoid the location step. Thus Chew’s algo- 
rithm applies to other cases where the location step can be avoided, e.g. deletion 
of a point in a Delaunay triangulation. 

3 Algorithm 

3.1 General Scheme 

The main idea is similar to Chew’s algorithm, that is to delete a random point 
p G Si from 'DT{S), to split the triangulation and then to insert p in the trian- 
gulation VT{Si \ {p}) avoiding the usual location step. The location of p can be 
done by computing the nearest neighbor of p in Si, which can be done in time 
T(p) log T(p) for some number T(p) depending on p, whose expectation is 0(1) 
(details will be given at Step of the algorithm of Section rOll . However, it is 
possible for example to have one point p, chosen with probability ^ such that 
T(p) = n, which brings the expected cost to E (T{p) logT(p)) = l7(nlogn). The 
idea is to choose two points Pa, pp and to take for p the better of the two, in 
order to concentrate the distribution around its mean. Here is the algorithm: 
Given VT{S), 

1. Choose two random points Pa,P/3 G S. Let i,j G {1,2} such that Pa G Si 
and pp G Sj {i and j do not need to be different). 

2. Look simultaneously for the nearest neighbor of Pa in Si and the nearest 
neighbor of pp in Sj. As soon as one of the two is found, say the neighbor q 
of Pa in Si, stop all searching and let p be Pa. 

3. Remove p from VT{S) to get VT{S \ (pj). 

4. Recursively compute VT{Si \ (pj) and VT{S 2 \ (pj) from T>T{S \ (pj). 

5. Determine the triangle of VT{Si \ (pj) incident to q that is traversed by the 
segment pq. 

6. Apply the usual Delaunay flip procedure to obtain T>T(Si) from T>T{Si\{p}). 

3.2 Combination Lemmas 

Note that in the algorithm, p is not a random point uniformly distributed among 
S, but one chosen among two random points. In this section, we investigate how 
this choice influences the mean value of some variable depending on p. 
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Let X{p) be a positive random variable depending on a point p, bounded 
above by n and of constant expected value E{X) = c. Since p is one point among 
n, X (p) can take n values n > Xi > X 2 > ■ ■ ■ > Xn > 0 and ^Xi = n ■ c. 

Pick i, j uniformly in {1, ...,n} independently and let Y = max{Xi, Xj} and 
Z = min{Xj, Xj}. 

Lemma 1. 

E{Y) < 2c. 



Proof. 



E{y)=^ max(Xi,X,) 






= i: V. + y: X. 

\ l<z<n 



Ki<n 



< 



2n 



X^ = 2c.[ 



l<i<n 



Lemma 2. If f is a concave non decreasing function, 



E{Z- f{Z))<2c- f{c). 



Proof. 



E{Z-f{Z)) = ^ Y, min(X„X,)/(min(X„X,)) 






= E + E 

= ^ E 0 - 

l<j<n 






l<j<n 



Clearly jXj < X^ < Xi < cn; 

l<2<i l^i^n 

E{Z-f{Z))<^^ Y .fix,) 

l<j<n 

2c 

< — n • /(c) by concavity of /.□ 
n 




316 



B. Chazelle et al. 



3.3 Algorithmic Details and Randomized Analysis 

Referring to the six different steps of the algorithm, here is a detailed cost anal- 
ysis: 

1. Done in time 0(1). 

2. The nearest neighbor in Si of a point p G Si can be found in the following 
way. Start considering all the Delaunay edges incident to p. Put them in a 
priority queue by increasing order of their distance to p. Explore the queue 
in the following way: each time that we consider a point q, there are two 
possibilities: 

— If q £ Si, we are done: q is p’s nearest neighbor in Si. 

— If q ^ Si, insert in the queue all its Delaunay neighbors, delete q and 
proceed to the following point in the queue. 

The correctness of this process is based on the fact that it simulates the way 
in which a circle centered in p would grow. In other words, if q G Si is the 
point we are looking for, the algorithm computes and orders all the points 
that are closer to p than q (obviously, none of them belongs to Si). The proof 
is based on the following observation. 

Fact. Let S be a set of points. Let C he any disk in the plane that contains 
a point s G S on its boundary. Let pi, . . . ,pk be all the points of S contained 
in C . Then s must have a Delaunay neighbor among pi, . . . ,pk. 

Proof. Grow a circle Cs through s, tangent to C and interior to C, until it 
reaches the first point pi (see Fig.QJ). 



The emptiness of Cg is obvious, and therefore spi is a Delaunay edge. □ 
In this procedure, we have explored and ordered all the points that lie closer 
to p than q, together with all their neighbors. Can T{p), the number of such 
points, be too big on average? As the randomly chosen point can belong 
either to S\ or to S2, we want to bound the following amount: 




Fig. 1. The points s and pi are Delaunay neighbors. 
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where NNi{p) denotes the nearest neighbor of p in Si, D{p, s) is the disk of 
center p passing through s and deg(q) denotes the degree of q in T>T{S). 
We bound the summands in the following way: 

p^Si q^D{p,NNi{p)) Q^S2 p s.t. q^D{p,NNi(p)) 

= ^ deg(g) number {p s.t. q S D{p,NNi{p))} 
qeS2 

< 6 ^ deg(g). 
q&S2 

The last inequality is due to the fact that the number of disks of the kind 
D{p, NNi{p)) that can contain a point g G ^2 is at most 6, because in the set 
S'! U {g} such a point p would have g as closest neighbor, and the maximum 
degree of g in the nearest neighbor graph of S'! U {g} is 6. 

Thus we get 



E{T) < ^ deg(g) + ^ deg(g) j < 36. 

\96S2 q&Si J 

Since the algorithm requires a priority queue, the cost of searching for g is 
0(T log T) if we use a balanced priority queue or even O(T^) if we use a 
simple list to implement the queue and E(T^) cannot be bounded by a con- 
stant. But the time for deciding which of Pc and pp will be p is the minimum 
of the times for finding the neighbors of Pa and pp and thus expected to be 
constant by Lemma |2 This step has expected cost 0(1). 

3. It is known that it can be done in time proportional to the degree of p with 
Chew’s algorithm. Since for a random point, the expected degree is 6, the 
expected degree of p is smaller than 12 by Lemma ^ Hence, this step has 
expected cost 0(1). 

4. If the cost of the algorithm is T{n), this step can be done in T{n — 1). 

5. Exploring all the triangles incident to g takes time proportional to the degree 
of g in T>E{Si \ {p}). But g is not a random point, but the nearest neighbor 
of p, itself chosen among two random points. We will prove below that the 
degree of the nearest neighbor in Si of a random point p G Si is at most 42, 
and thus by Lemma 0 the expected degree of g is less than 84 and this step 
can be done in time 0(1). 

Fact. Given a random point p in a set of points R, the expected degree in 
T>E{R \ {p}) of the nearest neighbor of p in R is at most 42. 

Proof. We have to consider the degree of a point in several graphs. Let 
degjvAf(g) be the degree of g in the nearest neighbor graph of R, deg(g) be 
the degree of g in VT{R) and degp(g) be the degree of g in VT{R\{p}). It is 
known that deg^jy(g) is at most 6. When p is removed from T>T{R) the new 
neighbors of g are former neighbors of p , thus degp(g) < deg(p) -I- deg(g). 
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The expected value of degp{NN{p)) is: 

E {degp{NN{p))) = degp(iV7V(p)) 

p€R 

< - V(deg(p)+deg(fVfV(p))) 
n 

pGR 

< 6 + - V deg(9) 

n 

P.qGlR q—NN{p) 

< 6 + - ^ (deg^^(g) deg(g)) 

qGR 

< 6 + - V (6 deg(g)) < 6 + 36 = 42.D 

n 

q&R 

6. It is known that this step can be done in time proportional to the degree of 
p, that is, in expected time 0(1) by Lemma D 

As a conclusion, we have proved the following 

Theorem 3. Given a set of n points S and its Delaunay triangulation, for any 
partition of S into two disjoint subsets, and S 2 , the Delaunay triangulations 
DT{Si) and DT{S 2 ) can be computed in 0{n) expected time. 

4 Concluding Remarks 

4.1 Alternative Ideas 

We should mention several simpler ideas that do not work. A first idea consists 
in deleting all the points of S 2 from DT{S) in a random order, but the degree of 
a random point in S 2 cannot be controlled; in fact if we take points on the part 
of the unit parabola with positive abscissa, the Delaunay triangulation links the 
point of highest curvature to all others (see Fig. Ej). If we split the set into two 
parts along the parabola and we remove the highest-curvature half of the point 
set in a random order, then the probability of removing the highest curvature 
point increases as the set of point decreases and the expected time to remove 
half the points is O(nlogn). 

Another idea is to remove the points not at random, but by increasing degree, 
but in that case the set of points to remove must be kept sorted by degree, 
although the degrees change during the algorithm. 

4.2 Convex Hull in 3D 

Through the projection of the plane on a paraboloid in 3D, Delaunay triangu- 
lations are closely related to convex hulls in three dimensions. 

Unfortunately, our algorithm, or more precisely its complexity analysis, does 
not generalize to 3D convex hulls. In this paper we use the fact that the nearest 
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Fig. 2. Points on a parabola 



neighbor graph is a subgraph of the Delaunay triangulation having bounded 
degree, and to generalize the algorithm we would need to define a neighboring 
relation which is a subgraph of the convex hull; several possibilities for such a 
subgraph exist but they do not provide bounded degree and thus the analysis 
does not generalize. 

Acknowledgments. The authors thank Oswin Aichholzer for various discus- 
sions about this problem. 
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Abstract. Let C be a simpl^ polygonal chain of n edges in the plane, 
and let p and q be two points on C. The detour of C on (p, q) is defined 
to be the length of the segment of C that connects p with q, divided by 
the Euclidean distance between p and q. Given an e > 0, we compute in 
time 0(n log n) a pair of points on which the chain makes a detour at 
least 1/(1 + e) times the maximum detour. 



1 Introduction 

A transportation route like a highway is supposed to provide a reasonably short 
connection between the points it passes through. More precisely, for any two 
points, p and q, on an open curve C in the plane that consists of a bounded 
number of smooth pieces, we call the value 



dc{p,q) 



\pq\ 



the detour of C on the pair {p,q)] here C® denotes the unique segment of C 
that connects p with q, \C!f \ denotes its length, and \pq\ is the Euclidean distance 
between p and q. 

Clearly, dc{p,q) > 1 holds. We assume that C does not self-intersect. Then 
dc{p, q) is bounded, and it tends to 1 as g tends to p along (70 We are interested 
in the value 

dc = niax dc(p,q) 

p,q&C 

called the maximum detour of curve C. 

The detour of a curve is an important notion in analyzing on-line navigation 
strategies, where the length of a path created by some robot must be compared 
to the shortest path connecting two points; see, e. g., Icking and Klein [^. A 
different type of application is in comparing the Frechet distance between two 
curves with their Hausdorff distance; while the first is always greater or equal 

^ C has no self-intersections. 

^ If (7 passes through the same point, p, twice then dc{p,q) is unbounded for certain 
points q close to p. 
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than the latter, there is, in general, no bound for the other direction. Only for 
curves of bounded maximum detour can such a bound be shown, see Alt et 
al. PI- The vertex-to- vertex maximum detour of graphs was already considered 
as the dual t-spanner problem by Narasimhan and Smid |S]. They provide an 
approximation of the vertex-to-vertex stretch factor in a more general setting. 

Intuitively, a curve that does not meander wildly should have a small maxi- 
mum detour. This idea can be made precise in several ways. An oriented curve 
running from s to t is called self-approaching if, for each point p, the curve 
segment C* fits in a 90° wedge with apex in p. The maximum detour of self- 
approaching curves has been tightly bounded by 5.3331...; see Icking et al. [ 7 |. 
This result can be generalized to wedges of arbitrary angles, see Aichholzer 
et. al. p. Rote uni has shown a tight upper bound of 2/37t for the detour of 
curves of increasing chords, i. e., curves that are self-approaching in both direc- 
tions @ 

These results were all obtained in an indirect way, by bounding the curve’s 
length by the perimeter of some simple, convex container. 

In this paper we present an 0(n log n) algorithm for computing directly the 
maximum detour of an arbitrary polygonal chain of n edges, up to an 1-l-e factor. 
The paper is organized as follows. In Sect. Pwe first state a local criterion that 
is necessary for a pair of points (p, q) on which the chain takes on its maximum 
detour. As a consequence, one of p, q can be assumed to be a vertex of the chain. 

In Sect. 0 we prove some global properties. It turns out that the maximum 
detour is always attained by a vertex-edge cut of the chain C, that is by a pair 
(p, q) of co-visible points of the chain, one of which is a vertex whereas the other 
may be an interior point of an edge. 

Moreover, we prove a certain property of the detours related to cuts that 
cross each other. While this property is weaker than the Monge property (see 
Burkard et al. |S]) it does imply that cuts attaining the maximum detour do not 
cross and must be, therefore, linear in number. 

In Sect. 2] we present an algorithm that computes, for a given real number 
e > 0, within time 0(n log n) a vertex-edge cut (p, (?) such that the maximum 
detour of chain C is at most 1 -I- e times the detour of C on (p, q) . Our algorithm 
uses the result by Gutwin and Keil 0 on constructing sparse spanners for finding 
a pair of vertices of C whose detour is close to the maximum detour chain C 
makes on all vertex pairs. 

Finally, in Sect. 0 we mention some open problems that naturally arise from 
this work. 



® The relationship between wedge containment and detour seems to work in only one 
way, because there are curves of arbitrary small maximum detour that do not fit in 
small wedges. 
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2 Local Properties 



Throughout this paper, let C be a simple, planar, polygonal chain of n edges. 
Simple means that C does not have self-intersections. That is, if any two edges 
intersect at all they must be neighbors in the chain, and their intersection is just 
a common vertex. 

Now let p and q denote two points on C, and let 



dc{p,q) 



\pq\ 



be the detour of C on (p,q). We want to analyze how the detour changes at 
points close to p on the same edge, e, while q remains fixed. Let the positive 
direction of e be such that the length of the chain segment increases, and let 
P denote the angle between the positive part of e and the line segment pq; see 
Fig. □ Excluding trivial cases we assume that 0 < /3 < tt holds. 




Lemma 1. For a fixed point q, the function dc{-,Q) takes on a unique maximum 
on edge e. If 



cos P = 




holds then this maximum is attained at p. If cos P is bigger (resp. smaller) than 
the right hand side, the detour can he increased by moving p forwards (resp. back- 
wards). 



Proof. Let p{t) be the point on edge e that lies in distance \t\ behind or before 
p = p(0), depending on the sign of t. By the law of cosine we have 

t+m 

_|_ |pg|2 _ 2t\pq\ cos P 



dc{p{t),q) 
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for positive and negative values of t. The derivative with respect to t has a 
positive denominator. Its numerator is of the same sign as 

I , \pq\ + \C^\ cos (3 ^ 

\pq\ cos P + Cp 

because the term \pq\ cos/3+ is positive. This implies the claims. □ 

We obtain the following important consequence: 



Lemma 2. Any polygonal chain makes its maximum detour on a pair of points 
at least one of which is a vertex. 



Proof. Let (p, q) be a pair of points on which the chain C attains its maximum 
detour, and assume that neither of p, g is a vertex of C. By Lemma ^ the line 
segment pq must form the same angle, 



P 



arccos(— 



IWI N 

\c^pV 



with the two edges containing p and q. Otherwise, the detour could be increased 
by moving one of the points. But then the detour dc{p,q) remains constant as 
we move both points simultaneously until one of them reaches the endpoint of 
its edge; see Fig. 0 In fact, we have 



dc{p',q') 



iqi + 2t 

|pg| — 2t cos P 



□ 




Fig. 2. Chain C attains its maximum detour on both pairs, (p, q) and (p', q'). 



If we are given a point p and an edge e of the chain we can apply Lemma E 
to determine, in time 0(1), the unique point g on e that maximizes the detour 
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dc{p, ■) on this edge. If we do this for all vertex-edge pairs (p, e) of the chain 
C, then the maximum value encountered will be the maximum detour of C, by 
Lemma |21 

This approach yields an O(n^) algorithm for computing the maximum detour. 
In the next section we will discuss how to improve on this bound using global 
properties. 



3 Global Properties 



The first observation is that we do not need consider pairs (p, q) of points that 
cannot see each other because of C. 



Lemma 3. The maximum detour of C is attained by a vertex-edge cut (p,q), 
where p is a vertex, q is a point on some edge, and p, q are co-visihl^^ 

Proof. Let p, q be two arbitrary points of C, and let p = po,pi, . . . ,Pfc = q he 
the points of C intersected by the line segment pq, ordered by their appearance 
on pq. For each pair (pi,pi+i) of consecutive points let Ci denote the segment of 
C that connects them. These segments need not be disjoint, so the sum of their 
lengths is at least as large as |C||. Hence, 

dc[p,q) = 1 ^ 

Efco \PiP^+i\ 

. IGI 

< max ^ r 

0<i<fc-l |piPi+i| 

= max dc{pi,Pr+i)- 
0<2<fe — 1 



To prove the last inequality we note that if Oi/bi < q holds for all i, then 
i < q follows. □ 

Hershberger has shown how to compute all co-visible vertex-vertex pairs 
of a simple polygon in time proportional to their number. LemmaOlwould invite 
us to generalize this algorithm to the m many co-visible vertex-edge pairs of 
a chain, and obtain an 0(m) algorithm for computing the maximum detour. 
Unfortunately, m can still be quadratic in n. 

An interesting example is the case of a convex chain, C, whose total turning 
angle is less than tt. There are 17 (n^) many co-visible pairs of vertices, but one 
can show that the maximum detour is always attained at an end point of C. 
Thus, there are only 0{n) many vertex-edge candidate pairs to be checked. One 

Two points, p and q, are called co-visible if the line segment connecting them contains 
no points of the chain C in its interior. 
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end point of C can attain the maximum detour with a point on each edge of C; 
such a chain can be constructed by approximating an exponential spiral, which 
is defined by the fact that all tangents form the same angle with the radii to the 
spiral’s center. 

Now we show that even for general chains there is at most a linear number 
of vertex-edge cuts that can attain the maximum detour. To this end, we prove 
the following fact illustrated by Fig. 0 



Fig. 3. The minimum of dc{p, q) and dc{r, s) is less than the maximum of dc{r, q) and 
dc{p, s), in either case. 



Lemma 4. Let p, r, q, s be eonseeutive points on C, and assume that (p, q) and 
(r, s) are two cuts of C that cross each other. Then 



The same statement holds if the points appear in order p,r, s, q on C. 

Proof Let us assume that dc{p, q) < dc{r, s) holds, and dc{p, q) > dc{r^ q). We 
have to show dc(p,q) < dc{p,s). By the triangle inequality, we have 



P 





u\in{dc{p,q),dc{r,s)) < ma,x{dc{r,q),dc{p,s)). 



\ps\ + \rq\ < |pg| -b |rs| 



therefore 



|C«|(|ps| -b |rg|) < |C«|(|pg| -b |rs|) 



< |Cpllpg| + IC'rllwl 

= \Cp\\pq\ + \Cf\\pq\ 

< \Cp\\pq\ + \C^\\rq\, 



and the claim follows. 



□ 



This property is weaker than the Monge property (see Burkard et al. 0) 



dc{p, q) + dc(r, s) < dc{p, s) + dc{r, q), 
which is not always fulfilled here. But we can draw the following consequence. 
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Lemma 5. Let {p, q) and (r, s) be two vertex-edge cuts that attain the maximum 
detour, dc- Then the cuts do not cross. Consequently, there are only 0(n) many 
such cuts altogether. 

Proof. If the line segments pq and rs crossed, then chain C would visit the points 
p,q,r, s in one of the two ways depicted in Fig. El and from Lemma 0 we would 
obtain a contradiction to the maximality of the cuts. By Euler’s formula for 
planar graphs, there can be only 0{n) many non-crossing vertex-edge cuts.0 □ 

The non-crossing property shown in Lemma El and Lemma El need not be 
fulfilled for locally optimal cuts. For example, there can be cuts (p, q),(r, s) 
satisfying 



dc{p, q) = ma,xdc{p,q') 
q' 

dc{T, s) = maxc?c'(r, s) 

s' 

that cross each other, see Fig. 0 




Fig. 4. Locally optimal cuts can cross. 



4 An Efficient Algorithm 

First, we have to solve two restricted versions of the maximum detour problem. 

Lemma 6. Let p be a given angle. A cut attaining the maximum detour among 
all vertex-edge cuts in direction p can be found in time 0{nlogn). 

Proof. Let us assume that p = tt/2 holds, so that we are interested only in 
vertical vertex-edge cuts. For each vertex p of the chain C we construct its upper 

® Formally, we can identify the non- vertex endpoints of all cuts hitting the same edge 
with one extra vertex. 
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and lower vertical extension, i. e., the vertical line segments that connect p to 
the first points of C above and below; see Fig. 0 This vertical decomposition was 
used by Seidel CD for the purpose o:^oint location. Using a sweep algorithm, 
we can construct it in time 0(n log n)|j Once all vertical extensions are available 
we traverse the chain and mark each endpoint of each vertical segment with the 
current odometer reading. When a segment is encountered for the second time, 
we can thus compute its detour, and determine the maximum of all. □ 




Fig. 5. The vertical extensions of the vertices of C. 



Another version of the maximum detour problem results if we restrict our- 
selves to vertex- vertex pairs. That is, we are interested in the value 

dp = max{dp(p, q);p,q vertices of C}. 

One should note that the claim of Lemma |3 does not hold in this case: two 
vertices attaining maximum detour need not be co- visible. We can prove the 
following approximation result. 

Theorem 1. Let C be a simple polygonal chain ofn edges in the plane, and let 
rj > 0. In time O(nlogn) we can compute a pair (p,q) of vertices of C satisfying 

dc < i^ + v)dc{p,q)- 

Proof. Let V denote the set of all vertices of C, and let S' be a sparse 1 -I- 77 - 
spanner of V. That means, S is a graph of 0(n) edges over V, and for any two 
points p,q of V there exists a path in S whose length is at most 1 -I- 77 times the 
Euclidean distance ImIO 

Now let (p,q) denote a vertex pair for which dp = dc{p,q) holds, and let 
P = PO1PI1 ■ ■ ■ ^Pk = q be the approximating path in S. Moreover, let Ci denote 
the segment of C that connects vertex pi to Pi+i- Similar to the proof of LemmaEl 
we argue as follows: 

® Faster algorithms are known, but not necessary for our purpose. 

^ Path length in S is defined as the sum of the Euclidean distances between consecutive 
vertices on the path. 
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dc = dc{p,q) = -r-T < 



\pq\ 



\pq\ 



■ E-Jo^+il 



< (1 + 7?) max , , 

= (1+77) max dc{Pi,Pt+i)- 

0<i<k — l 



A similar result was also obtained by Narasimhan and Smid in a more general 
setting, approximating the stretch factor of an Euclidean path in time 0 {n log n) . 
Theorem n suggests the following algorithm. 

Algorithm 1 

Input: A polygonal chain C on n edges and a real number 77 > 0. 

Output: A vertex pair of C whose detour is within 1 + 77 of the maximum vertex 
detour, d^, of C. 

1. Construct a sparse (1 + 77)-spanner of the vertices of C. 

2. For each edge of the spanner, compute its detour. 

3. Output an edge of the spanner having the largest detour. 

By a result of Gutwin and Keil |5j, step (1) can be carried out in time 0 {nlogn), 
see also the spanner survey of Eppstein Pj for alternate methods. Because of the 
spanner’s sparseness, step (2) takes only linear time; it can be implemented by 
traversing C, as in the proof of Lemma El Hence, the claim follows. □ 

Now we can prove our main result. 

Theorem 2. Let C he a simple polygonal chain of n edges in the plane, and 
let e > 0 . In time 0 {nlogn) we can compute a vertex-edge cut {p,q) of C that 
approximates the maximum detour of C , i. e., such that 

dc < (1 + e)dc{Piq)- 

Proof. Let 77 < e be small enough to satisfy 

1 — 77 1 

1 + 77 “ 1 + e’ 

choose an angle (3 in (7r/2,7r) so large that for all angles (3 G [( 3 ,tt) 

— cos B > — 

1 + e 

holds, and let finally the angle p so small that it satisfies 

sin p 

= < q- 

sin [3 
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We run the following algorithm. 

Algorithm 2 

Input: A polygonal chain C on n edges and a real number e > 0. 

Output: A vertex-edge cut of C whose detour is within 1 -|- e of the maximum 
detour, dc, of C. 

1. Let p be defined as above; 

for each integer m between 0 and j and for each vertex p, 
compute the first point of C hit by a ray from p in direction mp; 
move this hit point along its edge such as to maximize the detour locally. 

2. Let (pi,qi) be the maximum detour cut thus obtained. 

3. Let r] be as defined above; 

compute a pair {p 2 , 92 ) of vertices satisfying < (1-1- r])dc{p 2 , 92 )- 

4. Compute the maximum of dc{pi,qi) and dc{p 2 ,q 2 ), and output the corre- 
sponding pair of points. 

First, we address the correctness of Algorithm 2. Lemma |3 ensures that dc is 
attained by some cut (p, q), where p is a vertex of C. If <7 is a vertex, too, then 

dc = dc < {I + v)dc{P 2 ,q 2 ) < (1 + e)dc(P2,92) 

holds for the vertex pair (p 2 , 92 ) computed in (3), and we are done. Let us assume 
that q is an interior point of some edge, e. By Lemma ^ the outgoing part of e 
forms an angle /3 > 7 t/2 with the line segment pq, where 




If /3 > /3 then, by definition of /3, dp = — 1/ cos/3 < l-l-e holds, and our output is 
certainly correct. So, let us assume that /3 G (tt/ 2,/3). We distinguish two cases 
illustrated by Fig. 0 





Fig. 6. Two cases in the proof of Theorem 0 
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In case (i) , there exists an integer m such that the ray emanating from vertex 
p in direction mp hits the edge e that contains point q. This ray will be encoun- 
tered in step (1), and by local maximization, point g on e will be discovered. 
Hence, the cut (pi,gi) computed in step (2) attains maximum detour, dc- 

In case (ii), all rays emanating from vertex p are missing edge e, i. e., this 
edge is fully contained in a wedge of angle p. Let q' be the endpoint of e first 
reached from p, and let v < p be the angle between pq and pq' . By the law of 
sines we have 

\pq'\ ^ kg' I ^ Iwl 

sin j3 sin z/ sin(/3 — z/) ’ 

hence 



dc{p,q') = 



iqi - igg'i 
ipg'i 

_ sin(/3 - z/) |C^| 
sin/3 \pq\ 
, sin p 

> dc - 

sin fj 
>dc~r] 

> (1 - T])dc, 



sin ly 
sin/3 



because of sin(/3 — v) > sin/3, sinz/ < sinp, and sin/3 > sin/3. Since both p and 
q' are vertices of C, we obtain for the vertex pair (p 2 , g 2 ) computed in step (3) 



dc{P2,q2) > 
> 
> 
> 



1 

l + q 

1 

l + q 
1 - V 
l + q 
1 

ITe 



d^ 



dc{p, q') 

dc 

dc- 



It remains to account for the running time of Algorithm 2. For each fixed direc- 
tion mp, step (1) can be implemented to run in time O(nlogn), by combining 
Lemma El and Lemma Q The number of directions to be dealt with is a constant 
dependent only on e. Step (3) runs in time 0(n log n), by Theorem^ This com- 
pletes the proof0 □ 



® Taking the result of Gutwin and Keil E) into account, the overall dependency on 1/e 
is not worse than quadratic for small e. 
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5 Conclusions 

We have presented the first O(nlogn) algorithm for approximating the maxi- 
mum detour of a planar polygonal chain over n edges. This result gives rise to a 
number of interesting questions. Is the true complexity of this problem less than 
quadratic? How fast can we compute the maximum detour attained by a pair 
of co-visible vertices? How can smooth curves be handled? And finally, coming 
back to the evaluation of transportation routes, if a certain amount of money 
is available for building shortcuts of total length at most c, how far can the 
maximum detour be reduced? 
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Abstract. The problem Minimum Convex Cover of covering a given 
polygon with a minimum number of (possibly overlapping) convex poly- 
gons is known to be NP-hard, even for polygons without holes [S]. We 
propose a polynomial-time approximation algorithm for this problem for 
polygons with or without holes that achieves an approximation ratio of 
O(logn), where n is the number of vertices in the input polygon. To 
obtain this result, we first show that an optimum solution of a restricted 
version of this problem, where the vertices of the convex polygons may 
only lie on a certain grid, contains at most three times as many con- 
vex polygons as the optimum solution of the unrestricted problem. As a 
second step, we use dynamic programming to obtain a convex polygon 
which is maximum with respect to the number of “basic triangles” that 
are not yet covered by another convex polygon. We obtain a solution that 
is at most a logarithmic factor off the optimum by iteratively applying 
our dynamic programming algorithm. Furthermore, we show that Min- 
imum Convex Cover is APA-hard, i.e., there exists a constant <5 > 0 
such that no polynomial-time algorithm can achieve an approximation 
ratio of 1 + 5. We obtain this result by analyzing and slightly modifying 
an already existing reduction |2|. 



1 Introduction and Problem Definition 

The problem Minimum Convex Cover is the problem of covering a given 
polygon T with a minimum number of (possibly overlapping) convex polygons 
that lie in T. This problem belongs to the family of classic art gallery problems; 
it is known to be NP-hard for input polygons with holes m and without holes 
P]. The study of approximations for hard art gallery problems has rarely led to 
good algorithms or good lower bounds; we discuss a few exceptions below. In this 
paper, we propose the first non-trivial approximation algorithm for Minimum 
Convex Cover. Our algorithm works for both, polygons with and without 
holes. It relies on a strong relationship between the continuous, original problem 
version and a particular discrete version in which all relevant points are restricted 
to lie on a kind of grid that we call a quasi-grid. The quasi-grid is the set of 
intersection points of all lines connecting two vertices of the input polygon. 
Now, in the Restricted Minimum Convex Cover problem, the vertices of 
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the convex polygons that cover the input polygon may only lie on the quasi-grid. 
We prove that an optimum solution of the Restricted Minimum Convex 
Cover problem needs at most three times the number of convex polygons that 
the Minimum Convex Cover solution needs. To find an optimum solution 
for the Restricted Minimum Convex Cover problem, we propose a greedy 
approach: We compute one convex polygon of the solution after the other, and we 
pick as the next convex polygon one that covers a maximum number of triangles 
defined on an even finer quasi-grid, where these triangles are not yet covered by 
previously chosen convex polygons. We propose an algorithm for finding such 
a maximum convex polygon by means of dynamic programming. To obtain an 
upper bound on the quality of the solution, we interpret our covering problem 
on triangles as a special case of the general Minimum Set Cover problem that 
gives as input a base set of elements and a collection of subsets of the base set, and 
that asks for a smallest number of subsets in the collection whose union contains 
all elements of the base set. In our special case, each triangle is an element, 
and each possible convex polygon is a possible subset in the collection, but not 
all of these subsets are represented explicitly (there could be an exponential 
number of subsets). This construction translates the logarithmic quality of the 
approximation from Minimum Set Cover to Minimum Convex Cover cni. 

On the negative side, we show that Minimum Convex Cover is ^PWhard, 
i.e., there exists a constant ^ > 0 such that no polynomial-time algorithm can 
achieve an approximation ratio of 1 -I- <5. This inapproximability result is based 
on a known problem transformation we modify this transformation slightly 
and show that it is gap-preserving (as defined in P). 

As for previous work, the related problem of partitioning a given polygon 
into a minimum number of non-overlapping convex polygons is polynomially 
solvable for input polygons without holes j2j; it is fVP-hard for input poly- 
gons with holes even if the convex partition must be created by cuts from 
a given family of (at least three) directions ^3- Other related results for art 
gallery problems include approximation algorithms with logarithmic approxi- 
mation ratios for Minimum Vertex Guard and Minimum Edge Guard |H|, 
as well as for the problem of covering a polygon with rectangles in any orientation 
P). Furthermore, logarithmic inapproximability results are known for Minimum 
Point/Vertex/Edge Guard for polygons with holes 0, and APV-hardness 
results for the same problems for polygons without holes |E|. The related prob- 
lem Rectangle Cover of covering a given orthogonal polygon with a minimum 
number of rectangles can be approximated with a constant ratio for polygons 
without holes (Z) and with an approximation ratio of 0(-\/log n) for polygons 
with holes m For additional results see the surveys on art galleries [mni. 
The general idea of using dynamic programming to find maximum convex struc- 
tures has been used before to solve the problem of finding a maximum (with 
respect to the number of vertices) empty convex polygon, given a set of vertices 
in the plane 0, and for the problem of covering a polygon with rectangles in 
any orientation jOj. 

This paper is organized as follows: In Sect. 0 we define the quasi-grid and its 
refinement into triangles. Section 0 contains the proof of the linear relationship 
between the sizes of the optimum solutions of the unrestricted and restricted 
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Fig. 1. Construction of first-order basic triangles 



convex cover problems. We propose a dynamic programming algorithm to find a 
maximum convex polygon in Sect.0 before showing how to iteratively apply this 
algorithm to find a convex cover in Sect.0 In Sect. El we present the outline of 
our proof of the APAT-hardness of Minimum Convex Cover. A few concluding 
remarks can be found in Sect. 0 

2 Prom the Continuous to the Discrete 

Consider simple input polygons with and without holes, where a polygon T is 
given as an ordered list of vertices in the plane. If T contains holes, each hole 
is also given as an ordered list of vertices. Let Vt denote the set of vertices 
(including the vertices of holes, if any) of a given polygon T. While, in the 
general Minimum Convex Cover problem, the vertices of the convex polygons 
that cover the input polygon can be positioned anywhere in the interior or on 
the boundary of the input polygon, we restrict their positions in an intermediate 
step: They may only be positioned on a quasi-grid in the Restricted Minimum 
Convex Cover problem. 

In order to define the Restricted Minimum Convex Cover problem more 
precisely, we partition the interior of a polygon T into convex components (as 
proposed in [S| for a different purpose) by drawing a line through each pair of 
vertices of T. We then triangulate each convex component arbitrarily. We call the 
triangles thus obtained first-order hasie triangles. Figure 0 shows in an example 
the first-order basic triangles of a polygon (thick solid lines) with an arbitrary 
triangulation (fine solid lines and dashed lines). If a polygon T consists of n 
vertices, drawing a line through each pair of vertices of T will yield less than 
( 2 ) ■ ( 2 ) ^ 0{n‘^) intersection points. Let be the set of these intersection 
points that lie in T (in the interior or on the boundary). Note that Vr C Vijl. 
The first-order basic triangles are a triangulation of inside T, therefore the 
number of first-order basic triangles is also O(n^). The Restricted Minimum 
Convex Cover problem asks for a minimum number of convex polygons, with 
vertices restricted to V^, that together cover the input polygon T. We call a 
quasi-grid that is imposed on T. For solving the Restricted Minimum Con- 
vex Cover problem, we make use of a finer quasi-grid: Simply partition T by 
drawing lines through each pair of points from V’f. This yields again convex 
components, and we triangulate them again arbitrarily. This higher resolution 
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partition yields 0(n^®) intersection points, which define the set V^. We call 
the resulting triangles second-order basic triangles. Obviously, there are 0(n^®) 
second-order basic triangles. Note that Or C C 1/^. 

3 The Optimum Solution of Minimum Convex Cover vs. 
the Optimum Solution of Restricted Minimum Convex 
Cover 

The quasi-grids and serve the purpose of making a convex cover com- 
putationally efficient while at the same time guaranteeing that the cover on the 
discrete quasi-grid is not much worse than the desired cover in continuous space. 
The following theorem proves the latter. 

Theorem 1. Let T be an arbitrary simple input polygon with n vertices. Let 
OPT denote the size of an optimum solution of Minimum Convex Cover 
with input polygon T and let OPT' denote the size of an optimum solution of 
Restricted Minimum Convex Cover with input polygon T. Then: 

OPT' < 3 • OPT 

Proof. We proceed as follows: We show how to expand a given, arbitrary convex 
polygon C to another convex polygon C with C C C' by iteratively expanding 
edges. We then replace the vertices in C by vertices from which results in 
a (possibly) non-convex polygon C" with C C C" . Finally, we describe how to 
obtain three convex polygons C'{ ,C'f ,C'f with C" = C'{ U C'f U C'f that only 
contain vertices from . This will complete the proof, since each convex polygon 
from an optimum solution of Minimum Convex Cover can be replaced by at 
most 3 convex polygons that are in a solution of Restricted Minimum Convex 
Cover. Following this outline, let us present the proof details. 

Let C be an arbitrary convex polygon inside polygon T. Let the vertices 
of C be given in in clockwise order. We obtain a series of convex polygons 
C^,C^,...,C with C = C° C C C ... C C, where C'+^ is obtained 
from C as follows (see Fig. EJ: 

Let a, 6, c, d be consecutive vertices (in clockwise order) in the convex polygon 
C that lies inside polygon T. Let vertices b,c ^ Vr, with b and c not on the 
same edge of T. Then, the edge (6, c) is called expandable. If there exists no 
expandable edge in C*, then C = C, which means we have found the end of 
the series of convex polygons. If (6, c) is an expandable edge, we expand the edge 
from vertex b to vertex c as follows: 

— If 6 does not lie on the boundary of T, then we let a point p start on b 
and move on the halfiine through a and b away from b until either one of 
two events happens: p lies on the line through c and d, or the triangle p, c, b 
touches the boundary of T. Fixp as soon as the first of these events happens. 
Figure [3 shows a list of all possible cases, where the edges from polygon T 
are drawn as thick edges: Point p either lies on the intersection point of the 
lines from a through b and from c through d as in case (a), or there is a 
vertex vi on the line segment from p to c as in case (b) , or p lies on an edge 
of T as in case (c). 
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Fig. 2. Expansion of edge (b,c) 



— If 6 lies on the boundary of T, i.e. on some edge of T, say from Vk to Ufe+i, 
then let p move as before, except that the direction of the move is now on 
the way from Vk through b up until Vk+i at most (instead of the ray from a 
through b). 

Figure 121 shows a list of all possible cases: Point p either lies at vertex Vk+i 
as in case (d), or on the intersection point of the lines from b to Vk+i and 
from d through c as in case (e), or there is a vertex vi on the line segment 
from p to c as in case (f). 

A new convex polygon (7* is obtained by simply adding point p as a vertex in 
the ordered set of vertices of C* between the two vertices b and c. Furthermore, 
eliminate all vertices in C* that have collinear neighbors and that are not vertices 
in Vt- 

Note that an edge from two consecutive vertices b and c with b,c ^ Vt can 
always be expanded in such a way that the triangle 6, p, c that is added to the 
convex polygon is non-degenerate, i.e., has non-zero area, unless b and c both lie 
on the same edge of polygon T. This follows from the cases (a) - (f) of Fig. El 

Now, let = C*, if either a new vertex of T has been added to C* in the 
expansion of the edge, which is true in cases (b), (d), and (f), or the number of 
vertices of C* that are not vertices of T has decreased, which is true in case (a). 
If p is as in case (c), we expand the edge (p, c), which will result in either case 
(d), (e), or (f). Note that in cases (d) and (f), we have found If p is as in 

case (e), we expand the edge (p, d), which will result in either case (d), (e), or (f). 
If it is case (e) again, we repeat the procedure by expanding the edge from p and 
the successor (clockwise) of d. This needs to be done at most |C®| times, since 
the procedure will definitely stop once it gets to vertex a. Therefore, we obtain 
from C* in a finite number of steps. Let denote the number of vertices in 
C* that are also vertices in T and let fi be the number of vertices in C® that are 
not vertices in T. Now note that = Ti — 2ri -I- 2n is a function that bounds 
the number of remaining steps, i.e., it strictly decreases with every increase in 
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Fig. 3. Replacing non-T-vertices 



i and cannot become negative. The existence of this bounding function implies 
the finiteness of the series ,C of convex polygons. 

By definition, there are no expandable edges left in C . Call a vertex of C 
a T-vertex, if it is a vertex in T. From the definition of expandable edges, it is 
clear that there can be at most two non-T-vertices between any two consecutive 
T- vertices in C , and if there are two non-T-vertices between two consecutive 
T-vertices, they must both lie on the same edge in T. Let the T-vertices in C be 
t\, . . . ,ti in clockwise order, and let the non-T-vertices between T and be 
nti^i and nti^2 if they exist. We now replace each non-T-vertex nTj in C by one 
or two vertices nt} , and ntf , that are both elements of . This will transform 
the convex polygon C' into a non-convex polygon C" (we will show later how 
C" can be covered by at most three convex polygons C", , Cg ). 

To this end, let a,b,c be the first-order basic triangle in which non-T-vertex 
nti j lies, as illustrated in Fig. 0 Points a, b, c are all visible from both vertices 
ti and fi+i. To see this, assume by contradiction that the view from, say, ti to 
a is blocked by an edge e of T. Since nUj must see ti, the edge e must contain 
a vertex e' in the triangle ti, a, nUj, but then a cannot be a vertex of the first- 
order basic triangle in which nUj lies, since the line from vertex ti through 
vertex e' would cut through the first-order basic triangle, an impossibility. Now, 
let di be the intersection point of the line from ti-i through ti and the line from 
ti+i through ti+2- With similar arguments, the triangle ti,di,ti+\ completely 
contains triangle a, b, c. 

Assume that only one non-T-vertex exists between ti and T+i. If the 
triangle formed by ti, ti+i and a completely contains the triangle ti, nTp, we 

let g = a, likewise for b and c (see Fig.0(b)). Otherwise, we let (ntj g, be 

(a, &),(a,c), or (6, c) as in Fig. E|(a), such that the polygon fj, ntj g, ntf g, t^+g is 
convex and completely contains the triangle ti, nti i, This is always possible 
by the definition of points a, b, c. 

Now, assume that two non-T-vertices nti^i and nti^2 exist between ti and 
fi+i. From the definition of C , we know that nti^i and nti^2 must lie on the 
same edge e of T. Therefore, the basic triangle in which lies must contain a 
vertex a either at or proceeding nti^i on edge e along T in clockwise order. 

Let ntl g = a. The basic triangle in which nti^2 lies must contain a vertex 
b either at nti^2 or succeeding nti^2 on edge e. Let ntg 2 = b. See Fig. 0 (c). 
Note that the convex polygon U, nt} i,nt} 2, ti+i completely contains the polygon 
ti , T^ti \ , Tlti 2 , . 



An Approximation Algorithm for Minimum Convex Cover 339 








Fig. 5. Dynamic Programming 

Fig. 4. Covering C" with three convex polygons 



After applying this change to all non-T-vertices in C , we obtain a (possibly) 
non-convex polygon C" . First, assume that C contains an odd number / of 
T-vertices. We let Cf be the polygon defined by vertices and tj+i for all 

j, k and for all odd i, but i ^ f ■ By construction, C" is convex. Let C '2 be the 
polygon defined by vertices ti, nt^ ^ and ti+\ for all j, k and for all even i. Finally, 
let C 3 be the polygon defined by vertices and ti for all j,k. Figure E] 

shows an example. Obviously, C”,C 2 , and Cg are convex and together cover 
all of C”. Second, assume that C” contains an even number of T-vertices, and 
cover it with only two convex polygons using the same concept. This completes 
the proof. 

4 Finding Maximum Convex Polygons 

Assume that each second-order basic triangle from a polygon T is assigned a 
weight value of either 1 or 0. In this section, we present an algorithm using 
dynamic programming that computes the convex polygon M in a polygon T 
that contains a maximum number of second-order basic triangles with weight 
1 and that only has vertices from V^. For simplicity, we call such a polygon a 
maximum convex polygon. The weight of a polygon M is defined as the sum of 
the weights of the second-order basic triangles in the polygon and is denoted by 
\M\. We will later use the algorithm described below to iteratively compute a 
maximum convex polygon with respect to the triangles that are not yet covered, 
to eventually obtain a convex cover for T . 

Let a,b,c € Let Pa,b,c denote the maximum convex polygon that: 

— contains only vertices from Vr^, and 

— contains vertices a, b, c in counterclockwise order, and 

— has a as its left-most vertex, and 

— contains additional vertices only between vertices a and b. 

Given three vertices a,b,c G F)!, let A be the (possibly infinite) area of points 
that are: 
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— to the right of vertex a, and 

— to the left of the line oriented from b through a, and 

— to the left of the line oriented from b through c. 

For an illustration, see Fig. Let Pa^d,b U Aa,b,c, where 

max is defined as follows (to simplify notation): 



Lemma 1. Pa,6,c = Pab c’ */ triangle a, b, c is completely contained in the 
polygon T. 

Proof. Consider Pa,b,c which is maximum by definition. Pa,b,c must contain ad- 
ditional vertices between a and b (otherwise the lemma is trivially true). Let d' 
be the predecessor of b in the counterclockwise order of Pa,b,c- Vertex d' must 
lie in area A as defined above, otherwise the polygon a,d',b,c would either be 
non-convex, not have a as its left-most vertex, or not be in the required coun- 
terclockwise order. Now consider P" — Pa,b,c — Aa, b, c. From the definition of 
area A it is clear the P" can only contain vertices that lie in A. Now Pa,d',b is 
maximum by definition, and it is considered when computing P^ j, 

Let M he & maximum convex polygon for a polygon T with weights assigned 
to the second-order basic triangles. Let a be the left-most vertex of M, let c be 
the predecessor of a in M in counter clockwise order, and let b be the predecessor 
of c. Then \Pa,b,c\ = \M\ by definition. 

We will now use Lemma Q to construct an algorithm, which takes as input 
a polygon T and an assignment of weight 0 or 1 to each second-order basic 
triangle of T and computes the maximum convex polygon. To this end, we 
fix vertex a G Vji. Let a' be a point with the same x-coordinate and smaller 
y-coordinate than a. Now, order all other vertices b G to the right of a 
according to the angle formed by b, a, a'. Let the resulting ordered set be B and 
let B' be the empty set. Take the smallest element b from B, remove it from 
B and add it to set S', then for all c S Vf\B' and to the right of a, compute 
weight \Aa, b, c\ of the triangle a, b, c and compute Pa,b,c according to Lemmas 
Compute weight |Pa,b,c| by adding |Z\a, &, c| to \Pa,d,b\, where d is the maximizing 
argument. Note that the computation of Pa,b,c according to Lemma [flis always 
possible, since all possible vertices d in Pa,d,b lie to the left of the line from b to 
a (see also definition of area A) and have therefore smaller angles d, a, a' than 
b,a,a', and have therefore already been computed. The algorithm is executed 
for every a GVf, and the maximum convex polygon found is returned. 

Note that |T| = n, \V^\ = and \V^\ = 0(n^®). Ordering O(n^) vertices 

takes 0 ( 71 "^ log n) time. Computing the weight of a triangle takes 0(n^®) time. 
Computing Pa,b,c takes O(n^) time. We have to compute the weight of O(n^) 
triangles, which gives a total time of 0(n^^). Finally, we have to execute our 
algorithm for each a G Vf., which gives a total running time of 0{n^^). Space 
requirements are 0{n^^) by using pointers. 
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5 An Approximation Algorithm for Minimum Convex 
Cover 

Given a polygon T, we obtain a convex cover by iteratively applying the al- 
gorithm for computing a maximum convex polygon from Sect. 0 It works as 
follows for an input polygon T. 

1. Let all second-order basic triangles have weight 1. Let S' = 0. 

2. Find the maximum convex polygon M of polygon T using the algorithm from 
Sect.0, and add M to the solution S. Decrease the weight of all second-order 
basic triangles that are contained in M to 00 

3. Repeat step 2 until there are no second-order basic units with weight 1 left. 
Return S. 

To obtain a performance guarantee for this algorithm, consider the Minimum 
Set Cover instance I, which has all second-order basic triangles as elements 
and where the second-order basic triangles with weight 1 of each convex polygon 
in T, which only contains vertices from form a set in I. The greedy heuristic 
for Minimum Set Cover achieves an approximation ratio of 1 -I- Inn', where 
n' is the number of elements in / m and it works in exactly the same way as 
our algorithm. However, we do not have to (and could not afford to) compute 
all the sets of the Minimum Set Cover instance / (which would be a number 
exponential in n'): It suffices to always compute a set, which contains a maximum 
number of elements not yet covered by the solution thus far. This is achieved by 
reducing the weights of the second-order basic triangles already in the solution 
to 0; i.e. a convex polygon with maximum weight is such a set. 

Note that n' — 0(n^®). Therefore, our algorithm achieves an approximation 
ratio of O(logn) for Restricted Minimum Convex Cover on input polygon 
T. Because of Theorem Q we know that the solution found for Restricted 
Minimum Convex Cover is also a solution for the unrestricted Minimum 
Convex Cover that is at most a factor of 0(log n) off the optimum solution. 

As for the running time of this algorithm, observe that the algorithm adds 
to the solution in each round a convex polygon with non-zero weight. Therefore, 
there can be at most 0(n^®) rounds, which yields a total running time of 
This completes the proof of our main theorem: 

Theorem 2. Minimum Convex Cover for input polygons with or without 
holes can be approximated by a polynomial time algorithm with an approximation 
ratio of 0(logn), where n is the number of polygon vertices. 



6 APX-Hardness of Minimum Convex Cover 

The upper bound of 0(log n) on the approximation ratio for Minimum Convex 
Cover is not tight: We will now prove that there is a constant lower bound 
on the approximation ratio, and hence a gap remains. More precisely, we prove 

^ Note that by the definition of second-order basic triangles, a second-order basic 
triangle is either completely contained in M or completely outside M. 
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Minimum Convex Cover to be APX-ha,id. Our proof of the APX-hardness 
of Minimum Convex Cover for input polygons with or without holes uses 
the construction that is used to prove the A^P-hardness of this problem for 
input polygons without holet0 0. However, we reduce the problem Maximum 
5-Occurrence-3-Sat rather than SAT (as done in the original reduction |3|) 
to Minimum Convex Cover, and we design the reduction to be gap-preserving 
IP. Maximum 5-Occurrence-3-Sat is the variant of SAT, where each variable 
may appear at most 5 times in clauses and each clause contains at most 3 literals. 
Maximum 5-Occurrence-3-Sat is APA-complete |p. 

Theorem 3. Let I he an instance of Maximum 5-Occurrence-3-Sat con- 
sisting of n variables, m clauses with a total of I literals, and let I' be the cor- 
responding instance of Minimum Convex Cover. Let OPT be the maximum 
number of satisfied clauses of I by any assignment of the variables. Let OPT' 
be the minimum number of convex polygons needed to cover the polygon of I' . 
Then: 



OPT = m=> OPT' = bl + n+l 
OPT < (1 - 15e)m ^ OPT' >U + n+l + en 

Proof. Theorem |3 is proved by showing how to transform the convex polygons 
of a solution of the Minimum Convex Cover I' in such a way that their total 
number does not increase and in such a way that a truth assignment of the 
variables can be “inferred” from the convex polygons that satisfies the desired 
number of clauses. The proof employs concepts similar to those used in |S|; we 
do not include details, due to space limitation. 

In the promise problem of Maximum 5-Occurrence-3-Sat as described 
above, we are promised that either all clauses are satisfiable or at most a fraction 
of 1 — 15e of the clauses is satisfiable, and we are to find out, which of the 
two possibilities is true. This problem is A^P-hard for small enough values of 
e > 0 p. Therefore, Theorem 0 implies that the promise problem for Minimum 
Convex Cover, where we are promised that the minimum solution contains 
either 5l-\-n-\-l convex polygons or 51 -\- n -\- 1 -\- en convex polygons, is NP-haxd 
as well, for small enough values of e > 0. Therefore, Minimum Convex Cover 
cannot be approximated with a ratio of: >1-1- 25 nTn-i-i — ^ 

where we have used that I < 5n and n > 1. This establishes the following: 

Theorem 4. Minimum Convex Cover on input polygons with or without 
holes is APX-hard. 



7 Conclusion 

We have proposed a polynomial time approximation algorithm for Minimum 
Convex Cover that achieves an approximation ratio that is logarithmic in the 

^ APA-hardness for Minimum Convex Cover for input polygons without holes im- 
plies APA-hardness for the same problem for input polygons with holes. 
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number of vertices of the input polygon. This has been achieved by showing that 
there is a discretized version of the problem using no more than three times the 
number of cover polygons. The discretization may shed some light on the long- 
standing open question of whether the decision version of the Minimum Convex 
Cover problem is in iVP m - We know now that convex polygons of optimum 
solutions only contain a polynomial number of vertices and that a considerable 
fraction of these vertices are actually vertices from the input polygon. Apart from 
the discretization, our algorithm applies a Minimum Set Cover approximation 
algorithm to a Minimum Set Cover instance with an exponential number of 
sets that are represented only implicitly, through the geometry. We propose an 
algorithm that picks a best of the implicitly represented sets with a dynamic 
programming approach, and hence runs in polynomial time. This technique may 
prove to be of interest for other problems as well. Moreover, by showing APX- 
hardness, we have eliminated the possibility of the existence of a polynomial-time 
approximation scheme for this problem. However, polynomial time algorithms 
could still achieve constant approximation ratios. Whether our algorithm is the 
best asymptotically possible, is therefore an open problem. Furthermore, our 
algorithm has a rather excessive running time of 0(rA'^), and it is by no means 
clear whether this can be improved substantially. 

Acknowledgement. We want to thank anonymous referees for pointing us to 
0SI and for additional suggestions. 
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Abstract. Let G = {V,E) be a graph on n vertices and let A denote 
the maximum degree in G. We present a distributed algorithm that finds 
a O(Alogn) -edge-coloring of G in time O(log^n). 



1 Introduction 

In this paper, we consider a problem of edge-coloring of a graph in a distributed 
model of computations. In our model a network is represented by an undirected 
graph G = {V,E) where each vertex represents a processor of the network and 
an edge corresponds to a connections between processors. We assume full syn- 
chronization of the network: in every step, each processor sends messages to all 
its neighbors, receives messages from all of its neighbors, and can perform some 
local computations. The number of steps should be poly logarithmic in the size of 
the graph, and in addition we insist that the local computations of each proces- 
sor are performed in polynomial time. The above model is more restrictive than 
a classical distributed model introduced by Linial in inn2i. In Linial’s model 
there is no restriction on local computations performed by processors (for exam- 
ple processors can perform computations in exponential time). By default, all 
processors have different IDs, each proccesor knows |U|, the number of vertices 
in G, and A(G), the maximal degree in G. 

In the edge-coloring problem the goal of a distributed algorithm is to properly 
color the edges of G in a polylogarithmic (in n = |U|) number of steps. The 
main difficulty of designing such an algorithm comes from the fact that in such 
a ’’short time” a vertex v can learn only about vertices and edges that are 
within a ’’small” distance from v and based on this local information, a proper 

* This work was supported by KBN GRANT 7 TllC 032 20 
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coloring of E must be obtained. Let A denote the maximum degree of G. By 
Vizing’s theorem there is a proper edge-coloring of G with Z\ -I- 1 colors but the 
known proofs of this theorem don’t lead to a distributed algorithm. It is therefore 
natural to aim for an algorithm that uses 0{A) colors. In [Li92j . Linial presented 
an algorithm which, in log* n number of steps, colors vertices of G using 0{A^) 
colors. Linial’s procedure can be used to obtain a 0(ki^)-edge-coloring of G. Very 
recently, De Marco and Pelc claimed that an algorithm presented in jMa,Pe01| 
colors the vertices of G in 0{A) colors. Unfortunately, no complete proof of the 
main lemma in pVTaPePlj has been offered. In addition, in their algorithm the 
amount of local computations is not polylogarithmically bounded. 

In this paper, we present a distributed algorithm which colors edges of graph 
in 0{Alogn) colors. Our approach is based on computing a family of spanners 
of G. It turns out that this family can be used to color a constant fraction of 
edges of G using 0{A) colors. Iterating this process O(logn) steps leads to a 
proper coloring of E. However in each iteration a palette of 0{A) new colors 
is needed. Spanners were previously successfully used by Hanckowiak, Karonski 
and Panconesi jHKP99j . to design a distributed algorithm for a maximal match- 
ing problem. 

The rest of this paper is structured as follows. In Section 2, we present a 
procedure that constructs a family of spanners. Section 3 contains the description 
of our main algorithm and the proof of its correctness. 



2 Family of Spanners 

In this section, we present an algorithm that finds a family of spanners that are 
used to color our graph. The main idea of the algorithm is as follows. Suppose 
for a moment that all vertices in the graph have degree that are powers of two. 
In order to color the edges, we find a subgraph such that each vertex has degree 
(in this subgraph) equal to half of the original degree. We assign one to edges 
of the subgraph and zero to remaining edges. Next we repeat the procedure in 
” one-subgraph” and in ’’zero-subgraph”. As a result we obtain a sequence of 
zeros and ones on each edge. This sequence is a color of an edge. Note that in 
the distributed model we can not split a graph in such an exact way, but we can 
do it approximately. 

Let us start with some definitions. A bipartite graph El = {A, B, E) is called 
a D -block if for every vertex a G A, 

^ < degnia) < D. 



Definition 1. An {a, (3) -spanner of a D-block H = (A,B,E) is a subgraph 
S = [A' , B, E') of H such that the following conditions are satisfied. 

1. \A'\ > a\A\. 

2. For every vertex a € A', degs(a) = 1 . 

3. For every vertex b G B, degs{b) < ^degnib) -\- 1. 
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Procedure FindSpanners finds a family of 0{D) edge-disjoined {a, (3) - 
spanners in a _D-block for some constants a and f3. In each iteration of the main 
loop of FindSpanners we invoke a procedure Splitter that, in parallel, to each 
edge e assigns the labels bit{e) G {0, 1} and bad{e) G {Yes, No}. As a result, on 
every edge e we obtain a sequence of bits, that will be denoted by BitsSeq(e). In 
each iteration, we add (concatenate) a new bit to BitsSeq(e) increasing its length 
by one. During the execution of the algorithm some of the edges will be excluded 
from further considerations. Such edges will be marked by concatenating letter 
”N” to the end of BitsSeq(e). When FindSpanners quits then the sequences 
BitsSeq define a family of spanners. If the sequence BitsSeq{e) does not end 
with ”N” then BitsSeq(e) is the number of a spanner that contains edge e. If 
the sequence BitsSeq(e) ends with ”N” then e does not belong to any spanner 
of the family. By where ij G {0, 1}, we denote a subgraph of D-block 

H = {A, B, E) induced by these edges e for which BitsSeq(e) starts with a 
sequence {ii, . . . ,ik). By S'o we denote the whole block H. Let Nj{v) denote 
the set of neighbors of w in a graph J and let dj{v) = |fVj(w)|. 

Procedure FindSpanners 



1. For j := 0, . . . ,logD — 3 do: 

In parallel, for every subgraph J := 

where < ii, . . . ,ij > is an arbitrary sequence of bits, do: 

— Invoke procedure Splitter in J, which determines two functions: 
bit : E{.J) ^ {0, 1} 
bad : E{J) i— >■ {Yes, No} 

— In parallel, for every v G A, do: 

• If the number of edges {u,m}, u G Nj{v), such that bad{v,u) = Yes 
is larger than dj{v) / \ogn, then for every u G Nj{v), do: 
BitsSeq{v,u) := BitsSeq{v,u) o”W, 

• else, for every u G Nj{v), do: 

BitsSeq{v, u) := BitsSeq{v, u) o bit{v, u). 

2. Let j := logD — 4. In parallel, for every subgraph J := 
where < i\, . . . ,ij > is an arbitrary sequence of bits, do: 

For every vertex u G F( J) fl A do: 

— If dj{v) > 2 then change the jth bit in all but one edges incident to v 
to ”N”. 

3. The result of this procedure is a set of subgraphs S'<q,...,i,ogD- 4 >) 
where < ii, ... , UogD -4 > is an arbitrary sequence of bits. 

Every such subgraph is a spanner. 



Before we describe procedure Splitter let us define a vertex- splitting op- 
eration. For a vertex v let eo, . . . ,ek-i denote edges incident to v. If k is even 
then we replace v by vertices uq) • ■ • ; ^[fe/ 2 j-i 5 where each Vi has degree two. 
If k is odd then in addition we add one vertex vyk/ 2 \ of degree one. Then, for 
i Y {k/2\ — 1, edges C 2 i, e 2 i+i are incident to Vi and if k is odd then ek-i is 
incident to U[fc/ 2 J • 
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After splitting the vertices we obtain a graph of maximum degree two, that 
is a union of paths and cycles. Some of the vertices on a cycle (or path) will be 
marked as border vertices. Paths that connect two consecutive border vertices 
are called segments. Also the path-segments that connect an endpoint of a path 
with its closest border vertex are called segments. 

In the procedure Splitter, given below, we will use procedures from 
pdKP99| to partition paths and cycles into segments of length at least I := 
400 log^ n, in addition, marked by an alternating sequence of bits 0 and 1. Using 
the algorithms from [HKP99j such a partition can be found distributively in time 
O(log'n). 



Procedure Splitter(J) 

1. Split every vertex of a graph J into groups of vertices of degree two (and 
possibly one of degree one) to obtain a union of paths and cycles. 

2. Using procedures from IHK POOj (see procedures LongArrows and Split- 
ter) find in time 0(log^ n) segments of length which is greater than or equal 
to i •= dOOlog^n. In addition the procedures from IHKP99I assigns bits 0 
and I to edges of a segment, so that edges that are adjacent in the segment 
have different bits assigned to them. 

3. For every vertex w in the union of paths and cycles that corresponds to some 
vertex v G V{J) H i? in the original graph, do: If in is a border vertex and 
both edges incident to w have the same bit assigned then flip the value of 
one of them (i.e., 0 H> 1 or 1 i— >■ 0). 

4. As a result two functions are obtained: 
hit : E{J) ^ {1,0} 

bad : E{J) i— >■ {Yes, No}, 

where bit{e) is equal to the bit assigned to edge e in step 0 

If e is incident to a border vertex then put bad{e) = Yes, else bad{e) = No. 

(In particular, if e is an endpoint of a path then bed{e) = No). 



An edge e is called bad if bad{e) = Yes. If edge is not bad then it is called 
good. 

A vertex v G V{J) fl A is called nasty if there are more than dj{v ) / log n bad 
edges incident to v. Otherwise a vertex is called pliable. Observe that if an edge 
e is incident to a nasty vertex then FindSpanners assigns ”N” to e. 

Let < ii,. . . ,ij,ij+i > be a sequence of bits (i.e. ik G (0, 1}). 

Denote by J := dj{v) := dj{v), and dj+i{v) := (f)- 

In addition, let Pj := V{J) O A and 

Note that Pj+i is the set of pliable vertices in graph J obtained by Splitter 
in the jib. iteration of FindSpanner. 

Lemma 1. For every vertex v G Pj+i, 



1 

2 




dj{v) - 1 



< dj+i{v) < 



1 

2 




dj{v) + 1 



( 1 ) 



and for every v G B, 

dj+iiY) < ^ {dj{v) + 1 ) . 



( 2 ) 
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Proof. Let e+ and e_ denote the number of good and bad vertices (respectively) 
incident to v. Of course dj{v) = e+ + e_. First we prove m- Let V G Pj+i and let 
vi, . . . ,Vk denote vertices obtained from v after vertex-splitting operation. Then, 
since v is pliable, e_ < dj{v)/\ogn. To obtain the upper bound for dj+i(t;), 
observe that the worst case will arise if the following conditions are satisfied: 
there is no vertex Vi which is incident to two bad edges, vertices from {ui , . . . ,Vk} 
that are incident to one bad edge have on both edges incident to them bit ij+i, 
dj{v) is odd, and the bit on the edge incident to Vk (vertex of degree one) is 
ij+i- Consequently, 



dj+i{v) < 2e_ -I- (e+ - e_ - l)/2 + ^ - 2 lo^) ' 

To obtain a lower bound for dj+i{v), notice that the worst scenario for this case 
will arise if the following conditions are satisfied: there is no vertex Vi that is 
incident to two bad edges, vertices from {ui, . . . ,Ufe} that are incident to one 
bad edge have on both edges incident to them bit (1 — ij+i), dj{v) is odd, and 
the bit on the edge incident to Vk is (1 — ij+i)- Then, 

dj+i{v) > (e+ - e_ - l)/2 > ^ (^(^1 - dj{v) - 1 

Inequality (0 follows from step El of procedure Splitter. 



Lemma 2. In the jth iteration of procedure FindSpanners ( j = 0, . . . , logD — 
3 ), the following condition is satisfied: 



|P,+i|>|P,| 1- 



1 






200 log n Sj 



where, Aj = max{(ij(?;) : v € Pj} and Sj = min{dj(u) : v € Pj}. 

Proof. Proof is the same as the proof of a similar lemma in IHKPOOI . Recall 
that Pj+i is the set of pliable vertices obtained after execution of Splitter 
in the jth iteration of procedure FindSpanners. Let Nj^i be the set of nasty 
vertices and let &e[iV,+i] denote the number of bad edges incident to vertices 
from Nj+i. Observe that |Pj-i-i| + = \Pj\- First we establish a lower 

bound for be[Nj+i]. Note that if u G Nj+i then there are at least Sj/logn bad 
edges incident to v. Therefore, 

Observe that be[Njj_i] can be bounded from above by the number of all bad 
edges in graph J. Since \E{J)\ < Aj\Pj\ and every segment has length at least 
i = 400 log^ n and contains at most two bad edges. 



be[Nj+i] < 



400 log^ n 
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Combining last two inequalities yields 






— |p-l 

200 log n Sj ^ 



and the lemma follows. 

Theorem 1. Let H = (A,B,E) be a D-bloek. Proeedure FindSpanners finds 
in O(log^n) rounds a family of ^D, spanners ofH. 

Proof. There are OflogD) iterations in procedure FindSpanners. In each iter- 
ation, main computations are performed by Splitter which runs in 0(log^ n) 
rounds. Thus the number of rounds is 0(log^ n). 

Recall that dj{v) := (t), Pj ■= R(*S'<q,,..yj->) (lA. Let k logD — 4. 

We will show that for every sequence of bits < i\, . . . ,ik >, graph S := S'<q,...yj,> 
is a (^, 16)-spanner. Hence, we have to verify that S satisfies the following three 
conditions. 

- \Pk\ > h\Po\ 

- Vu e Pfc : dk{v) = 1 

- Vv G B : dk(v) < l6do{v)j^ + 1 

First we observe that the third condition is satisfied. By Q in Lemma □ , we 
see that for every v G B, 



dfc(T) < {do{v)-l) + l 

which shows that the third condition is satisfied. Applying Lemma ^(part Q) 
k times shows that for every v G Pk, 

qt {do{v) -k 1) - 1 < dk{v) < {do{v) - 1) -k 1 

where 

Since G A : D /2 < do{v) < D, we have that for every v G Pk, 
q~ “ 1 — dk{v) < q+{D — 1) + 1. 

So, for sufficiently large n, 

q>f _ 1 > - 1 > 0 

and 

(D - 1) -k 1 < Ibe^ -k 1 < 120. 
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Thus for every v G Pfe, dkiy) > 0. Since in step0of FindSpanners we disregard 
all but one edge incident to v, the second condition is satisfied. Finally we verify 
that the first condition is satisfied as well. Indeed, apply Lemma|2|fc times. Then 



fc-i 

iPfei>iPoin(i 



i=o 



^ A 
200 log n Sj )' 



Note that 



A ^ gjp-i) + i 

5j - qi{D/2 + l)-l 



< 120 



since the fraction in the middle is an increasing function of j for j < k. Therefore, 
for n sufficiently large, we have 



\Pk\> 




3 

5 log n 




3 Algorithm 



In this section, we present and analyze our edge-coloring algorithm. The al- 
gorithm runs in O(log'^n) rounds and uses 0{A\ogn) colors. The algorithm 
can be divided into four smaller procedures: FindSpanners, ColorSpanner, 
ColorBipartite, ColorGraph. Procedure FindSpanners finds a family of 
edge-disjoint spanners of a block and was preseneted in a previous section. Let 
H = {A, B, E) be a P-block, m=\A\ + \B\, o = | and (3 = 16. Then, by Theo- 
rem n FindSpanners finds in 0(log^ m) rounds a family oi D/ (3 edge-disjoint 
(a, /3)-spanners. First, we describe a procedure that colors a family of spanners 
found by FindSpanners. Note that a single spanner is simply a collection of 
vertex-disjoint stars, with centers of stars in set B, and so vertices of B (in par- 
allel) can color edges of the spanner provided they have enough colors available. 
We need some additional notation. Let 



K ■= 



I 

D 



^ + 1 , 



and for j :=!,... ,D/(3 let 






Procedure ColorSpanner 

1. Find a collection of spanners S\, . . . ,Sd/i 3 using FindSpanners. 

2. In parallel for every vertex b € B do: 

For every j := 1, . . . , D/(3 do: 

— Color edges of Sj incident to h using colors from Ij . 



352 



A. Czygrinow, M. Haiickowiak, and M. Karonski 



Lemma 3. Let H = (A,B,E) be a D-block, m = |A| + \B\. Procedure COL- 
OrSpanner colors at least ^\E\ edges of El using 0{A) colors and runs in 

O(log^m) rounds . 

Proof. Indeed, since every (a, /3)-spanner contains at least a\A\ vertices of de- 
gree one in A, the number of edges in the family of spanners is at least 
a\A\^D > ^\E\. Also, for every j = l,...,D//3 and every vertex b £ B, 
degsj{b) < K = \Ij\ and so edges within one spanner are colored properly. 
Since we use different colors to color different spanners, coloring obtained by 
ColorSpanner is proper. Hence 0{A) colors is suffices, because at each vertex 
of B we need ^E)K colors and D < A. 

We can now consider bipartite graphs. Let G = {L, R, E) be a bipartite graph 
with \L\ = |i?| = n. For e = {u,v} £ E with u £ L, v £ R define a weight of e as 



w(e) := degaiu) 


(3) 


'■= XI 


(4) 



eeE 



be a total weight of a set E C E. Trivially, we have 

u){E) < n^. 

Let Di := Z\/2*, where i = 0, . . . , log A, and consider £>i-blocks Hi = {At, B, Ef) 
where 

Ai-.= {u £ L : ^ < degaiu) < A}, B := R, 

and Ei contains all edges of G that have an endpoint in Ai. 

Procedure ColorBipartite 

1. In parallel, for z := 0, . . . ,logZ\, color every Hi using ColorSpanner. 

2. In parallel for every vertex v £ R do: 

For every color c do: 

a) Define Ec^y to be the set of edges incident to v that are colored with 
color c. 

b) Find in an edge e with maximum weight and uncolor all edges in 
Ec,v except e. 

Lemma 4. Let G = {L, R, E) be a bipartite graph with n = \L\ = |i?|. Procedure 
ColorBipartite properly colors a set E C E such that uj{E) > aco{E)/{6f3) 
using 0{A) colors and runs in O(log^n) rounds. 

Proof. Sets Ai are disjoint and so after step 1 of the algorithm, there will be no 
edges of the same color that are incident to vertices of L. After step 2, every 
vertex v £ R has at most one edge of a given color incident to it. Let A C A 
denote the set of edges of the zth block that are colored in step 1 . From Lemma 
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13 \Ei\ > a\Ei\/(3, and since the minimum degree in Eli is larger than Dij2, we 
have oj{Ei) > a\Ei\Di/ {2(3). Consequently 



We need to argue that at least one third of this weight is maintained after 
’’correction” done in the second step of the procedure. Fix a vertex v G R and 
color c. Let M := M{v,c) denote the maximum weight of an edge incident to 
V in color c. Since edges incident to v that have the same color must belong to 
different blocks, weight of edges incident to v that have color c is less than 



Therefore, the total weight of edges that remain colored is at least -^u>{E). 

Before we proceed, we need one additional concept. Let P = {V,E) he & path 
or cycle on at least two vertices. A partition Vi, . . . , Vt of is called short if the 
following conditions are satisfied. 

1. Every graph Pi = P\Vi\ induced by Vi is a path or cycle. 

2. For every i = 2 < |y,| = 0(log|E|). 

3. For every i = 1, ...,<— 1, |Vi fl = 1. If P is a cycle than |Vb fl Vt\ = 1. 

Vertices that belong to Vi f) V+i for some i are called border vertices. In 
|A(ILF8H) . the following fact is proved. 

Lemma 5. Let P = (V, E) be a path or cyele with \V\ > 2. There is a proeedure 
that finds a short partition of P in 0(log |V|) rounds. 

Let us now consider an arbitrary graph G = {V, E). Our first task is to obtain an 
auxiliary bipartite graph from G which will be colored using ColorBipartite. 
Define a bipartite graph G' := {L, R, E') as follows. Every vertex splits itself into 
two vertices (u,0) and (u, I). Let L := {{v,0) : v G V}, R := {{v,l) : v G V} 

and {(rt, 0), {v, 1)} G E' if and only \i u < v and {u, u} G E. 

Procedure ColorGraph 

1. Obtain G' = (L, R, E') from G = (V, E) as described above. 

2. Apply ColorBipartite to G' . 

3. Obtain a coloring of G by merging vertices (u, 0), (u,I) that correspond to 
a single vertex v G V . (This operation can result in some monochromatic 
paths and cycles.) 

4. In parallel for every monochromatic path and cycle P do: 

a) Find a short partition of P into segments Pi, . . . , Pj. 

b) In parallel for every segment Pi do: 



— Find a matching M\ that saturates all but at most one vertex in Pi 
and let M 2 := E{Pi) \ Mi. 

— If CO (Ml) > uj{M 2 ) then uncolor edges of M 2 . Otherwise uncolor 
edges of Mi. 



co{Eq U . . . U Piogzi) > 2 ^ — 2(3^^^^' 
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c) In parallel for every border vertex v do: 

— If both edges incident to v have the same color then uncolor an edge 
with a smaller weight. 



A weight r>}) of an edge {m, v} € E with u < v is defined as the number 

of vertices w £ V such that u < w and {u, w} G E. Thus the weight of {u, u} is 
exactly equal to the weight of edge {(u,0), {v, 1)} G E' . In the same way as in 
we can define uj{E) for any set E C E. 

Lemma 6. Let G = {V,E) be a graph with n = \V\. Proeedure ColorGraph 
eolors a set E <Z E sueh that co{E) > aw (if)/ (24/3) using 0{A) eolors and runs 
in O(log^n) rounds. 

Proof. From Lemma 0 we know that after step 2 we colored a set E' C E' 
such that uj{E') > aw(if')/(6/3). Thus the total weight of edges that remain 
colored after step 4(b) is at least uj{E')/2. Finally, the total weight of edges that 
remain colored after step 4(c) is at least w(if')/4. Since eo{E') = uj{E), procedure 
ColorGraph colors a set E such that ui{E) > aw(if)/(24/3). 

Iterating ColorGraph O(logn) times yields a proper coloring of graph G = 
{V,E). 

Procedure Color 
1. Run O(logn) times: 

— Use procedure ColorGraph with a pallete of 0{A) new colors (different 
than previously used colors) to properly color set E d E. 

— Let E := E\E. 



Theorem 2. Let G = (V,E) be a graph with n = \V\. Procedure Color prop- 
erly colors edges of G using 0{Alogn) colors and runs in O(log^n) rounds. 



Proof. By Lemma 0 in each iteration of the procedure E is properly colored 
using 0{A) colors. Since Color uses different colors in each iteration, obtained 
coloring is proper. 

To complete the proof, we show that E is empty after O(logn) iterations 
of the main loop in Color. Notice that if 0 is equivalent to w(if) > 1. 
Let P=(l — ^^)~^ and let w denote the weight of the edge set of graph G. 
By Lemma 0 the total weight of edges left after the kth. iteration is at most 
u)/p^ . Since OJ < n^, the right hand side of the last inequality is less than one if 
k > 

logp 
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Abstract. We examine the replica placement aspect of a distributed 
peer-to-peer file system that replicates and stores files on ordinary desk- 
top computers. It has been shown that some desktop machines are avail- 
able for a greater fraction of time than others, and it is crucial not to 
place all replicas of any file on machines with low availability. In this pa- 
per we study the efficacy of three hill-climbing algorithms for file replica 
placement. Based on large-scale measurements, we assume that the dis- 
tribution of machine availabilities be uniform. Among other results we 
show that the MinMax algorithm is competitive, and that for growing 
replication factor the MinMax and MinRand algorithms have the same 
asymptotic worst-case efficacy. 



1 Introduction 

Farsite PH is a distributed peer-to-peer file system that replicates and stores 
files on ordinary desktop computers rather than on dedicated storage servers. 
Multiple replicas are created so that a user can access a file if at least one of 
the machines holding a replica of that file is accessible. It has been shown ^ 
that some desktop machines are available for a greater fraction of time than 
others, and it is crucial not to place all replicas of any file on machines with low 
availability, or the availability of that file will suffer. 

In earlier work P], we evaluated the efficacy and efficiency of three hill- 
climbing algorithms for file replica placement, using competitive analysis and 
simulation. The scenario under consideration was a static problem in which the 
availability of each machine was fixed, and each replica stably remained on the 
machine to which the placement algorithm assigned it. Our study found that 
algorithmic efficiency and efficacy ran counter to each other: The algorithm 
with the highest rate of improvement yielded a final placement with the poorest 
quality relative to an optimal placement. 

In actual practice, the replica placement problem is not static. The availabil- 
ity of each machine (defined loosely as the fraction of time it is accessible) varies 

* Due to lack of space we omit most of the proofs in this extended abstract. The 
complete paper is available as Microsoft Research technical report MSR-TR-2001- 
62. 
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over time as user behavior changes. In addition, file replicas may be evicted 
from machines by other processes in the system. The replica placement algo- 
rithm does not produce a static final placement that thereafter persists; rather, 
it continuously operates to correct for dynamic changes in the system. Viewed 
from this dynamic perspective, extensive Monte Carlo simulation shows that 
the MinMax algorithm consistently out-performs the other two algorithms, even 
though it was proven j0| to be non-competitive. Hence, our theoretic worst-case 
competitive analysis opposes use of the algorithm that appears best in practice. 

We thus face an apparent dilemma: Either we fail to exploit an algorithm 
that is demonstrably efficient, or we risk the possibility that our system will en- 
counter a distribution of machine availabilities that renders our algorithm use- 
less. In the present paper, we make stronger assumptions about the algorithm’s 
input, based on large-scale measurement of machine availability P|. Given these 
assumptions, which - we stress - are a close approximation of the behavior of 
actual machines, we show that the MinMax algorithm is competitive for the lev- 
els of replication we intend to use in actual deployment. Obtaining these new 
results requires completely different analysis methods from those used for our 
earlier general-distribution results, which relied on highly unusual availability 
distributions utterly dissimilar to those found in real systems. 

Furthermore, our earlier studies evaluated competitiveness in terms of the 
least available file, which is a straightforward quantity to analyze. However, from 
a systems perspective, a better metric is the effective availability of the overall 
storage system, which is readily computable in simulation. In the present paper, 
we show that all worst-case results for minimum file availability are also worst- 
case results for effective system availability, further legitimizing the relevance of 
our theoretic analyses. 

In our opinion, the significance of this work lies in the fusion of four elements: 
an important problem from an emerging area of systems research, simulation 
results that demonstrate the practical performance of a suite of algorithms, large- 
scale measurements of deployed systems that provide a tractable analytic model, 
and rigorous theoretic analysis to provide confidence in the algorithm selected 
for use in the actual system. We consider this an exemplary synergy of systems, 
simulation, measurement, and theory. 

The remainder of the paper is organized as follows. The next section describes 
the Farsite system and provides some motivation for why file replica placement is 
an important problem. Section|5|describes the algorithms. In Section^we further 
motivate this paper, followed by a summary of results in Section El Section El 
presents a simplified model, which is used in Section [7| to analyze the efficacy of 
the algorithms. Section 0 compares the two measures of efficacy. In Section El we 
conclude the paper by presenting related work. 



2 Farsite 



Farsite m is a distributed peer-to-peer file system that runs on a networked 
collection of desktop computers in a large organization, such as a university or 
corporation. It provides a logically centralized storage repository for the files of 
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all users in the organization. However, rather than storing these files on dedi- 
cated server machines, Farsite replicates them and distributes them among all 
of the client computers sitting on users’ desktops. As compared to centralized 
storage, this architecture yields great savings in hardware capital, physical plant, 
system administration, and operational maintenance, and it eliminates a single 
point of failure and single target of attack. The disadvantage of this approach 
is that user’s desktop machines lack the physical security and continuous sup- 
port enjoyed by managed servers, so the system must be designed to resist the 
threats to reliability and security that are inherent in a large-scale, distributed, 
untrusted infrastructure. 

For files stored in Farsite, the following properties are maintained: privacy, 
integrity, persistence, and availability. Data privacy and integrity are ensured by 
encryption and digital signatures. File persistence is provided by generating R 
replicas of each file and storing the replicas on different machines. The data will 
persist as long as one of the replicas resides on a machine that does not suffer a 
destructive failure, such as a disk head crash. Since it is difficult to estimate the 
remaining lifetime of a particular disk with any accuracy m, the degree of data 
persistence is considered to be determined entirely by the replication factor R 
and not by any measurable aspect of the particular machines selected for storing 
the replicas. 

In this paper we focus on file availability, meaning the likelihood that the 
file can be accessed by a user at the time it is requested, which is determined 
by the likelihood that at least one replica of that file can be accessed at the 
requested time. The fractional downtime of a machine is the mean fraction of 
time that the machine is unavailable, because it has crashed, has been turned 
off, has been disconnected from the network, etc. A five-week series of hourly 
measurements of over 50,000 desktop machines at Microsoft 0 has shown that 
the times at which different machines are unavailable are not significantly corre- 
lated with each other, so the fractional downtime of a file is equal to the product 
of the fractional downtimes of the machines that store replicas of that file. For 
simplicity, we express machine and file availability values as the negative loga- 
rithm of fractional downtime, so the availability of a file equals the sum of the 
availabilities of the R machines that store replicas of the file. 

The goal of a file placement algorithm is to produce an assignment of file 
replicas to machines that maximizes an appropriate objective function. We con- 
sider two objective functions in this paper: (1) the minimum file availability over 
all files and (2) the effective system availability (ESA), defined as the negative 
logarithm of the expected fractional downtime of a file chosen uniformly at ran- 
dom. When we evaluate the efficacy of a file placement algorithm, we are gauging 
its ability to maximize one of these objective functions. For our theoretic anal- 
yses in both our earlier work |2| and the present paper, we focus on the metric 
of minimum file availability, because it is more readily tractable. Our simulation 
results, such as those described in Section 0 relate to ESA because it is more 
meaningful from a systems perspective. One of our current findings (Section 0 
is that all of our theoretic worst-case results for minimum file availability are 
also theoretic worst-case results for effective system availability. 
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Measurements of over 10,000 file systems on desktop computers at Microsoft 
0 indicate that a replication factor of i? = 3 is achievable in a real-world setting 
0. Thus, we have a special interest in the case R = 3. 

3 Algorithms 

Files in Farsite are partitioned into disjoint sets, each of which is managed by a 
small, autonomous group of machines. This imposes the requirement that a file 
placement algorithm must be capable of operating in a distributed fashion with 
no central coordination. Farsite is also a highly dynamic system in which files 
are created and deleted frequently and in which machine availabilities continu- 
ously change. This imposes the requirement that a file placement algorithm must 
be able to incrementally improve an existing placement, rather than require a 
complete re-allocation of storage resources. These and other considerations uni 
have led us to a family of iterative, swap-based algorithms: One group of ma- 
chines contacts another group (possibly itself), each of which selects a file from 
the set it manages; the groups then decide whether to exchange the machine 
locations of one replica from each file. The groups select files according to one 
of the following algorithms: 

— RandRand swaps a replica between two randomly chosen files, 

— MinRcind swaps a replica between a minimum-availability file and any other 
file, and 

— MinMax swaps a replica between a minimum-availability file and a maximum- 
availability file. 

(We use the particle “Rcind” rather than “Any” because this reflects the way 
files are selected in the system, even though all that matters for our theoretic 
analysis is the absence of a selection restriction.) The groups swap replicas only 
if doing so reduces the absolute difference between the availabilities of the two 
files, which we call a successful swap. If a pair of files has more than one success- 
ful swap, the algorithm chooses one with minimum absolute difference between 
the files’ availabilities after the swap (although this does not affect theoretical 
efficacy) . Because the algorithms operate in a distributed fashion, their selection 
restrictions are weakened, i.e., the MinMax and MinRauid algorithms might select 
files whose availability values are not globally minimum or maximum. For our 
theoretic analysis, we concentrate on the more restrictive case in which only 
extremal files are selected. 

4 Motivation 

If, beginning with a random assignment of replicas to machines, we run each 
algorithm until it freezes (meaning no more swaps can be found), we find that the 
three algorithms differ substantially in both the efficacy of their final placements 
and the efficiency with which they achieve those placements. Simulations show 
that the MinMax algorithm improves the availability of the minimum file more 
quickly than the other two algorithms. On the other hand, MinMax tends to freeze 
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at a point with lower minimum file availability, since swaps are only considered 
between the minimum-availability file and the maximum-availability file. 

In earlier work Pj, we performed a worst-case analysis to determine each 
algorithm’s competitive ratio p = m/m*, where m is the availability of a 
minimum-availability file when the algorithm freezes, and m* is the availabil- 
ity of a minimum-availability file given an optimal placement, for a worst-case 
availability distribution. The results were that MinMax (the most efficient al- 
gorithm) was not competitive (p = 0), whereas MinRand and RcindRand were 
2/3-competitive for i? = 3. 




MinMax MinRand RandRand 



Fig. 1. Steady-state behavior of the algorithms 



If we exercise each algorithm in a dynamic scenario that more closely matches 
the environment in which the Farsite system operates, the results are even more 
disconcerting. Figure d shows the result of a steady-state simulation in which 
two processes operate concurrently on the placement of replicas. One process 
(maliciously) moves random replicas to random machines, simulating the dy- 
namic behavior of users and machines. The other process performs one of our 
three hill-climbing algorithms, trying to repair the damage caused by the ran- 
dom moves. With the exception of unrealistically high correction ratios, MinMax 
performs significantly better than the other two algorithms. 

We are in the unpleasant situation that a theoretical worst-case result (p = 0 
for MinMax) opposes the use of an algorithm that works best for real-world data. 
In this paper, we begin to address this discrepancy by noting the distribution 
of availability values found in a large-scale study of desktop machines in a com- 
mercial environment |3|, reproduced here as Figure El This figure shows that, 
when expressed logarithmically, machine availabilities follow a distribution that 
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is nearly uniform. This finding, coupled with the observation that most of our 
worst cases |0| need rather unusual distributions of machine availabilities, sug- 
gests that we can improve the bounds of the worst-case analysis by making 
stronger assumptions about the input, namely that we have a uniform distribu- 
tion of machine availabilities. 




measured uniform 



Fig. 2. Distribution of machine availabilities 



5 Results 

In this paper we will take for granted that the distribution of machine availabil- 
ities be uniform. With this assumption we show that the MinMax algorithm is 
competitive. More surprisingly, when the replication factor R grows the MinRand 
and MinMax algorithms have the same asymptotic worst-case efficacy. This is 
counterintuitive when looking at our earlier results 0 . We study the case i? = 3 
with special care, since the real Farsite system is expected to be deployed with 
R = 3. We also give detailed results for i? = 2 since they are considerably 
different from R > 2. Here is a detailed summary of our results: 



Algorithm 


general R 


R = 2 


R = 3 


MinMax 


p=l-0{l/R) O 


p = 0 © 


p = 1/2 ® 


MinRand 


p=l-0{l/R) O 


p = 1 0 & (E) 


p = 22/27 0 


RcUidRand 


p=l-0{l/R^) (0) 


p = 1 0 & (0 


p = 8/9 (0) 
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6 Model 

We are given a set of N unit-size files, each of which has R replicas. We are 
also given a set of M = N ■ R machines, each of which has the capacity to 
store a single file replica. Throughout this paper we assume that machines have 
(uniformly distributed) availabilities O 7 , 17 , 27 , . . . , (M — 1 ) 7 , for an arbitrary 
constant 7 . 

Let the R replicas of file / be stored on machines with availabilities 
«!,... ,a/{. To avoid notational clutter, we overload a variable to name a file 
and to give the availability value of the file. Thus, the availability of file / is 
/ = oi + ■ • • + O-R- 

As in our earlier study jO], we examine the point at which the algorithms 
freeze. Let m be a file with minimum availability when the algorithm has ex- 
hausted all possible improvements. Let m* be a file with minimum availability 
given an optimal placement for the same values of N and R. We compute the 
ratio p = min m/m* as N ^ 00 . We say that the algorithm is p-competitive. 
Note that the scale 7 of the machine availabilities does not affect p; throughout 
this paper we therefore assume 7 = 1 . 

If two or more files have minimum availability, or if two or more files have 
maximum availability, we allow an adversary to choose which of the files will be 
considered for a potential swap. 

7 Analysis 

We start this section with a number of undemanding observations which will 
help us simplify the analysis. 

MinMax searches for swap candidates in a subset of MinRand, and similarly 
MinRand C RandRand, thus 

Lemma 1. PMinMax — PMinRand — dRandRauid- 

In this paper we study a restricted case (we assume uniform availability) of 
the general problem that was investigated in P|. We have immediately: 

Lemma 2. For the same algorithm, we have Pg^ngf-al — Puniform' 

The next Lemma shows a simple observation: There always is an optimal 
assignment. This simplifies calculating the competitive ratio p. 

Lemma 3. For any R > 1 and uniform distribution there is an optimal assign- 
ment, where all the files have the same availability R{M — l)/2. 

Lemma 4. PMinReuidfl — cjR, where c is a positive constant. If R = 3 then 
c= 5/9. 

Theorem 1. PMinRandfl = PRandRandij ~ ^ ~ G>{l/R). 

Theorem 2. PwinRands = 22/27. 
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Proof. (We include this proof in the extended abstract as a representative for the 
proof techniques in this paper.) With LemmaSwe have that PMinRands — 22/27. 
For proving the Theorem it is therefore sufficient to show that PMinRanda — 
22/27. 

The intuition of the proof: We partition the set of machines into five regions. 
With a detailed case study we show which combinations of regions do not allow 
successful swaps with the minimum file. Then we classify the valid combinations 
of regions, and give a combinatorial proof about their quantity which ultimately 
leads to a lower bound for the availability of the minimum file. For simplicity 
we omit insignificant constants throughout this proof (i.e. we write M instead 
of M- 1). 

Here are the details: Let the minimum file be m = oi + 02 + 03 with ai > 
02 > 03. Assume for the sake of contradiction that m < 11/9 • M. We define 
the following sets of machines (see Figure EJ: Machines in A have availability 
less than 03, machines in B between 03 and 02, machines in C between 02 and 
(oi + 02) /2, machines in D between (oi + 02) /2 and ai, and machines in E more 
than oi. With this partitioning the availability of the minimum file m translates 
into m = 2|A| + \B\ + M — \E\, and with m < 11/9 • M we get 

2|A| + \B\ - \E\ < 2 f 9 - M = 2/3 • N. 




C I 

(a.^+afil2 



D 




Fig. 3. Partition of the machines 



Case 1: We consider all the files / = 61 + 62 + with bi S E, and 62 > ^3- 
If &2 + &3 > 02 + 03, then we swap the machines bi and oi and get m' = 
bi 0.2 T 03 > Oi 02 03 = ui and f' = 0\ -t- &2 T 63 ^ o^ -t- 02 03 = m. Thus 

62 + ^3 < 02 + 03, and therefore (with &2 > ^3) &3 < (o2 + 03)/2. Since each bi £ E 
needs a &3, we know that |if| < (02 + 03) / 2 . Since |A| + |H |/2 = {02 + 03 ) /2 and 
2 |A| + \B\ — \E\ < 2/9 • M we get (02 + 03) /2 < 2/9 • M. On the other hand 
\E\ < (o2 + 03)/2 shows that (oi + 02)/2 > 1/2 • M. Since &2 + &3 < «2 + 03 < 
4/9 ■ M < 1/2 ■ M < (oi + 02)/2, we know that 62 G A U i? U C. If 62 > 02 
then we have &3 < 03, in other words, if 62 G C then 63 G A. If 62 G A U 5 then 

63 G A U H. Since we have used all machines in set E in this case, there are no 
machines in E in the following cases. 

Case 2: We identify all remaining files / = 61 + 62 + 63 with 61 G C, and 

62 > 63. If 63 G D, then we swap machines 61 and 02, getting m' = oi +61 +03 > 
oi + 02 + 03 = m and /' = 02 + 62 + 63 > 02 + 63 + 63 > 02 + 2(ai + 02 ) /2 > m. 
Therefore bs ^ D. Thus for each 61 G C we have 62 G A U B U C U D and 

63 G AU B U C. We have used all machines in C; henceforth the sets C and E 
are taboo. 

Case 3: We identify all the remaining files / = 61 + 62 + 63 with 61 G B, and 
62 > 63. If 63 G D, then we swap machines 61 and 03, getting m! = ai +02 + 61 > 
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ai + a 2 + a^ = m and /' = 03 + &2 + ^3 > 03 + 63 + 63 > 03 + 2(oi + 02)/2 = m. 
Therefore bz ^ D. Thus for each bi € B we have 62 € AU BU D and 63 G AVJB. 
Henceforth the sets B, C, E are taboo. 

Case 4: Finally we identify all the remaining files / = + 62 + with 

61 G D, and 62 > b^. If 63 G D, then we swap machines b\ and 02, getting 
m' = oi + 61 + 03 > ai + 02 + 03 = m and /' = 02 + 62 + ^3 > 02 + 63 + 63 > 
02 + 2(oi + 02)72 > TO. Thus for each bi £ D we have 62 G H U B and 63 G A. 

From above analysis we have seen that a file / can only consist of the these 
combinations of regions: (Case 1) E + C + A or E + {AU B) + {AU B) or (Case 
2) C + (H U B U C U B) + (A U B U C) or (Case 3) B + (A U B U B) + (A U B) 
or (Case 4) B + A + A or B + B + A. We define the two functions gi, g 2 - 

g,{f) = \C\-\D\ 

52(/) = 2|A| + |B|-|B| 

FigureElshows all possible files / with respect to the functions gi{f) and (?2(/)- 
Note that for all possible files / we have g 2 {f) > 0. We put the files into three 
classes. Class X are the files / with gi{f) < 0 (the black circles); class Y are the 
files with gi(/) = 0 (the white circles); class Z are the files with gi{f) > 0 (the 
grey circles). Note that for files f £ X we have gi{f) > —2 and g 2 {f) > 2, and 
that for files f £Y we have 52(7) > 1- 

We have M machines, thus (ignoring the single mimimum file to) |A| + |B| + 
\C\ + |B| + \E\ = M = 3N. This translates into |AT| + |y| + \Z\ = N for the three 
classes X,Y, Z. The sets C and B were defined such that they exactly split the 
region of machines between the 02 and oi, hence ICI = |B|. Using gi{f) > —2 
for f £ X, and gi{f) > 1 ior f £ Z, the constraint \C\ = |B| translates into 
2\X\ > \Z\. Both constraints together, we get SlXj + |F| > |AT| + |U| + \Z\ = N. 
We multiply with 2/3: 2\X\ + |U| > 2\X\ + 2/3 • |F| > 2/3 • N. We use this 
inequality to get: 

2|A| + |B| - |B| = ^ 32(7) + E ^?2(7) + E 52(7) > 2|X| + |F| > 2/3 • N. 
fex fev fez 

(The first equality is the definition of g 2 ', the middle inequality is because files 
f £ X have 32(7) ^ 2, files f £ Y have 32(7) > Ij and files f £ Z have 

ff2(7) > 0.) 

This contradicts our assumption that 2| A| + |B| — \E\ < 2/3 • iV and therefore 
the assumption that to < 11/9-M. Thus to > 11/9-M. With LemmaElwe know 
that TO* = 3(M — l)/2. Thus p = m/m* > 22/27, as M goes to infinity. 

Theorem 3. PRandRandij = 1 — c/B^, where c is a positive constant. If R is 
odd then c = 1. 

Theorem 4 . PninMaxs = 1/2- 

Theorem 5. PMinMaxfl = 1 ~ 2/B, for R even. 
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92(f) 




Fig. 4. Possible locations for a file /. 



8 Measures of Efficacy 



We can show that any worst-case result for minimum file availability is also 
a worst-case result for effective system availability. We show that the effective 
system availability can be as low as the minimum file availability, and it cannot 
be lower. 



Theorem 6 . Let b be the base for converting downtime d into availability a, 
that is a = — logj d. As b — >■ 00, the effective system availability (ESA) equals 
the availability of the minimum file. 

Proof. Let b = e^. Then a = — logj, d = — 1 /c • Ind, where In = logg. If 6 — >■ 00 
then c — >■ 00. Let m be the availability of the minimum file. Assume that there 
are A > 0 files with availability m and N — X files with availability fi with 
fi > m, for 1 = 1 ,... ,N — X. Then, applying the definition of ESA, 



lim ESA = lim - ^ In [ — ( Xb~^ + V | 

3 — >^00 c— >^00 c \ N \ * ^ I I 

= lim -- In ( ) = lim f m - - In ^ ) 

c— >^00 C \N J c^oo y c N J 



= m. 



Similarly, 

Theorem 7 . Let b he the positive base for converting uptime into availability. 
Then, ESA > m. 
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9 Related Work 

Other than Farsite, serverless distributed file systems include xFS 0 and Frangi- 
pani ini, both of which provide high availability and reliability through dis- 
tributed RAID semantics, rather than through replication. Archival Intermem- 
ory IHI and OceanStore m both use erasure codes and widespread data distri- 
bution to avoid data loss. The Eternity Service PJ uses full replication to prevent 
loss even under organized attack, but does not address automated placement of 
data replicas. A number of peer-to-peer file sharing applications have been re- 
leased recently: Napster pni and Gnutella na provide services for finding files, 
but they do not explicitly replicate files nor determine the locations where files 
will be stored. Freenet |S| performs file migration to generate or relocate replicas 
near their points of usage. 

To the best of our knowledge j0| is the first study of the availability of repli- 
cated files, and also the first competitive analysis of the efficacy of a hill-climbing 
algorithm. 

There is a common denominator of our work and the research area of approxi- 
mation algorithms; especially in the domain of online approximation algorithms 
such as scheduling H2!- In online computing, an algorithm must decide 
how to act on incoming items without knowledge of the future. This seems to 
be related our work, in the sense that a distributed hill-climbing algorithm also 
makes decisions locally, without the knowledge of the whole system. Also, online 
algorithms research naturally focuses on giving bounds for the efficacy of an 
algorithm rather than for the efficiency. 

Competitive analysis has been criticized as being too crude and unrealistic 0 . 
In this paper, we have narrowed the gap between theoretical worst-case analysis 
and real-world simulations, which has emerged because of unusual worst case, 
by making stronger and more realistic assumptions about the input. This is an 
approach that is well-known in the area of online algorithms; for an overview, 
see Chapter 5 in 0 for paging algorithms, and Section 2.3 in jj] for bin packing 
algorithms. 
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Abstract. A cycle cover of a graph is a spanning subgraph where each node is 
part of exactly one simple cycle. A fc-cycle cover is a cycle cover where each 
cycle has length at least k. We call the decision problems whether a directed or 
undirected graph has a fc-cycle cover fc-DCC and fc-UCC. Given a graph with 
edge weights one and two, Min-fc-DCC and Min-fc-UCC are the minimization 
problems of finding a fc-cycle cover with minimum weight. 

We present factor 4/3 approximation algorithms for Min-fc-DCC with running 
time (independent of fc). Specifically, we obtain a factor 4/3 approxima- 

tion algorithm for the asymmetric travelling salesperson problem with distances 
one and two and a factor 2/3 approximation algorithm for the directed path pack- 
ing problem with the same running time. On the other hand, we show that fc-DCC 
is A/’P-complete for fc > 3 and that Min-fc-DCC has no PTAS for fc > 4, unless 
V=NV. 

Furthermore, we design a polynomial time factor 7/6 approximation algorithm 
for Min-fc-UCC. As a lower bound, we prove that Min-fc-UCC has no PTAS for 
fc > 12, unless V = MV. 

1 Introduction 

A cycle cover of an either directed or undirected graph G is a spanning subgraph C 
where each node of G is part of exactly one simple cycle of C. Computing cycle covers 
is an important task in graph theory, see for instance Lovasz and Plummer Id, Graham 
et al. EJ, and the vast literature cited there. 

A fc-restricted cycle cover (or fc-cycle cover for short) is a cycle cover in which 
each cycle has length at least fc. To be specific, we call the decision problems whether 
a graph has a fc-cycle cover fc-DCC, if the graph is directed, and fc-UCC, if the graph 
is undirected. Since fc-DCC and fc-UCC are A/^T’-complete for fc > 3 and fc > 6, 
respectively, we also consider the following relaxation: given a complete loopless graph 
with edge weights one and two, find a fc-cycle cover of minimum weight. Note that 
a graph G = (U, E) has a fc-cycle cover if the corresponding weighted graph has a 
fc-cycle cover of weight |U|, where edges get weight one and “nonedges” get weight 
two in the corresponding complete graph. We call these problems Min-fc-DCC and 
Min-fc-UCC. They stand in one-to-one correspondence with simple 2-factors as defined 
by Hartvigsen Q- A simple 2-factor is a spanning subgraph that contains only node- 
disjoint paths and cycles of length at least fc. (The paths arise from deleting the weight 
two edges from the cycles.) 

* supported by DFG research grant Re 672/3 
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As our main contribution, we devise approximation algorithms for finding minimum 
weight fc-cycle covers in graphs with weights one and two. Moreover, we provide lower 
hounds in terms of A/^P-completeness and nonapproximability, thus determining the 
computational complexity of these problems for almost all k. 

1.1 Previous Results 

The problems 2-DCC and Min-2-DCC of finding a (minimum) 2-cycle cover in directed 
graphs can be solved in polynomial time by reduction to the bipartite matching problem. 
To our knowledge, nothing is known for values k > 3. 

The problem 3-UCC of hnding a 3-cycle cover in undirected graphs can be solved in 
polynomial time using Tutte’s reduction to the classical perfect matching problem 
in undirected graphs which can be solved in polynomial time (see Edmonds 0|). Also 
Min-3-UCC can be solved in polynomial time. Hartvigsen |2t] has designed a powerful 
polynomial time algorithm for 4-UCC. This algorithm works for Min-4-UCC, too. He has 
also presented a polynomial time algorithm that computes a minimum weight 5-cycle 
cover in graphs where the weight one edges form a bipartite graph |9i|. On the other 
hand, Cornuejols and Pulleyblank Q have reported that Papadimitriou showed the MV- 
completeness of k-UCC for k >6. 

Let n be the number of nodes of a graph G — {V,E). For k > n/2 the problem 
Min-fc-DCC is the asymmetric and Min-fc-UCC is the symmetric travelling salesper- 
son problem with distances one and two. These problems are ^PA’-complete |fl7|. For 
explicit lower bounds, see Engebretsen and Karpinski [Q. The best upper bound for 
the symmetric case is due to Papadimitriou and Yannakakis II 711 . They give a factor 
7/6 approximation algorithm running in polynomial time. For the asymmetric case, 
Vishwanathan | fT9| presents a polynomial time factor 17/12 approximation algorithm. 
Exploiting an algorithm by Kosaraju, Park, and Stein 1 1 211 for the asymmetric maximum 
travelling salesperson problem, one obtains an approximation algorithm with perfor- 
mance ratio 88/63 ~ 1.397 by replacing weights two with weights zero. 

Closely related to the travelling salesperson problems with distances one and two 
is the node-disjoint path packing problem. This problem has various applications, such 
as mapping parallel programs to parallel architectures and optimization of code, see 
e.g. Vishwanathan HI and the pointers provided there. We are given a directed or 
undirected graph. Our goal is to find a spanning subgraph S consisting of node-disjoint 
paths such that the number of edges in S is maximized. Utilizing the algorithms of 
Papadimiriou and Yannakakis o and of Kosaraju, Park, and Stein El, one obtains a 
polynomial time factor 5/6 approximation algorithm for the undirected problem and a 
polynomial time approximation algorithm with performance ratio 38/63 « 0.603 for 
the directed problem. 

1.2 Our Results 

We present factor 4/3 approximation algorithms for Min-fc-DCC with running time 
0(n®/^) (independent of k). Specihcally, we obtain a factor 4/3 approximation algo- 
rithm for the asymmetric travelling salesperson problem with distances one and two 
and a factor 2 /3 approximation algorithm for the directed node-disjoint path packing 
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Input: a complete loopless directed graph G with edge weights one and two, 

an integer k > 3. 

Output: a fe-cycle cover of G. 

1. Compute a minimum weight 2-cycle cover G of G. 

2. Form the bipartite graph B and compute the function F. 

3. Compute a decomposition of F as in Lemma0 

4. Patch the cycles of G together acccording to the refined patching procedure. 



Fig. 1. The algorithm for directed cycle covers. 



problem with the same running time, thus improving the results of Vishwanathan and 
Kosaraju, Park, and Stein. On the other hand, we show that /c-DCC is A/^T’-complete for 
fc > 3 and that Min-fc-DCC does not have a PTAS for fc > 4, unless V = MV. For the 
undirected case, we design factor 7/6 approximation algorithms for Min-fc-UCC with 
polynomial running time (independent of k). It includes the algorithm of Papadimitriou 
and Yannakakis as a special case. As a lower bound, we prove that there is no PTAS for 
Min-Zc-UCC for k > 12, unless V — J\fV. 

2 Approximation Algorithms for Directed Cycle Covers 

In this section, we present approximation algorithms for Min-/c-DCC with performance 
ratio 4/3 for any fc > 3 running in time (independent of fc). Particularly, we 

obtain a factor 4/3 approximation algorithm for the asymmetric travelling salesperson 
problem with distances one and two by choosing fc > n/2. 

Our input consists of a complete loopless directed graph G with node set V of 
cardinality n, a weight function w that assigns each edge of G weight one or two, and an 
integer fc > 3. Our aim is to find a fc-cycle cover of minimum weight. For the analysis, we 
assume that a minimum weight fc-cycle cover of G has weight n + i for some 0 < £ < n. 
In other words, a minimum weight fc-cycle cover consists ofn — £ edges of weight one 
and £ edges of weight two. 

Figure HI gives an overview of our algorithm. A detailed explanation of each of the 
steps is given in the subsequent paragraphs. 

Computing a 2-cycle cover. We first compute an optimal 2-cycle cover G of G. This can 
be done in polynomial time. Assume that this cover G consists of cycles Ci , . . . , c^. We 
denote the set {ci , . . . , c^} by C. The lengths of some of these cycles may already be fc 
or larger, but some cycles, say ci, . . . , Cs for s < r, have length strictly less than fc. The 
basic idea of our algorithm is to use cycle patching (also called subtour patching when 
considering travelling salesperson problems, see Lawler et al. lirTTI l. A straight forward 
way is to discard one edge (if possible of weight two) of each cycle of length strictly 
less than fc and patch the resulting paths arbitrarily together to obtain one long cycle. 
An easy analysis shows that this yields a factor 3/2 approximation. We obtain the 4/3 
approximation by refining the patching procedure. 
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Auxiliary edges. For the refined patching procedure, we form a bipartite graph B as 
follows: we have the node set = {ci, . . . , Cg} on the one side and V on the other. 
There is an edge (c, v) in B iff v does not belong to the cycle c and there is a node u in 
c such that {u, v) has weight one in G. 

Lemma 1. B has a matching of cardinality at least s — £ 

Proof. Consider an optimal fc-cycle cover Copt of G. Since the length of the cycles in 
are strictly less than k, for each cycle c in there is an edge (m, v) of Copt such 
that u belongs to c but v does not. Fix such an edge for each cycle in C''. At least s — ^ 
of these edges have weight one, thus appear in B and form a matching. □ 

Decomposition of functions. We compute a maximum matching M in B. From M we 
obtain a directed graph F = (C, A) with (c, c') S A whenever (c, u) is an edge of M 
and u is a node of c'. Each node of F has outdegree at most one, thus F defines a partial 
function C —r C whose domain is a subset of C^. By abuse of notation, we call this 
function again F. By the construction of B, we have F(c) c for all c £ C, i.e. F does 
not contain any loops. 

Lemma 2. Any loopless partial function F has a spanning subgraph S consisting solely 
of node-disjoint trees of depth one, paths of length two, and isolated nodes such that any 
node in the domain of F is not an isolated node of S. Such a spanning subgraph S can 
be found in polynomial time. 

Proof. Every weakly connected component of F is either a cycle possibly with some 
trees converging into it, a tree (whose root r is not contained in the domain of F), or an 
isolated node (which is also not contained in the domain of F). It suffices to prove the 
lemma for each weakly connected component of F. 

The case where a component is a cycle with some trees converging into it follows 
from Papadimitriou and Yannakakis O Lem. 2]. In this case, no isolated nodes arise. 

In the case of a tree, we take a leaf that has maximum distance from the root r. Let 
s be the successor of that leaf. If s equals r, then the component considered is a tree of 
depth one and we are done. Otherwise, we build a tree of height one with root s and all 
predecessors of s as leaves, remove this tree, and proceed inductively. We end up with a 
collection of node disjoint trees of height one and possibly one isolated node, the root r 
of the component. Since r is not contained in the domain of F, this case is completed. 

If a node is isolated, then this node is not contained in the domain of F, because F 
is loopless. Again, we are done. 

This decomposition can be computed in polynomial time. □ 

A refined patching procedure. We compute a decomposition of the directed graph F 
according to LemmaQ Isolated nodes of this decomposition correspond to elements of 
C not in the domain of F, i.e. either cycles of length at least k or unmatched cycles from 
C^. The former ones, call them fulfil the requirements of a £-cycle cover, thus we 
can ignore them in the subsequent considerations. We denote the latter ones by The 
cycles in have length strictly less than k. We merge those cycles to one long cycle 
d, breaking an edge of weight two whenever possible. 
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Fig. 2. Trees of height one 




Fig. 3. Paths of length two 



Next, we consider the trees of height one. Let c be the root of such a tree and 
Ci^, . . . , Ci^ S \ be its leaves. For each cycle Ci^ , there is an edge from Ci^ 
to a node of c. By construction, these nodes vi, . . . ,Vm are pairwise distinct. We 
merge Ci^ and c as depicted in Fig.El We call this new cycle again c and incorporate the 
remaining cycles Ci^, . . . , Ci^ in the same fashion. (The node v in Fig. Elmay be some 
Vfj,' but this does not matter.) After that we merge c with d. In c, we break one of the 
edges drawn dashed in Fig. El In d, we discard an edge that does not belong to a cycle 
in Cj^Q, i.e. has been added during the merging process. We call this new cycle again d. 

Finally, we consider the paths of length two. The three cycles corresponding to such 
a path are merged as shown in Fig. 01 (The end node of the edge e and the start node 
of edge / may coincide. Moreover, the two removed edges of the cycle in the middle 
may coincide. In the latter case, we only incur weight one instead of weight two.) The 
resulting cycle will be merged with d as described above in the case of a tree. 

At the end of this procedure, we are left with the cycle d and the cycles in If the 
cycle d still has length stricly less than k, we break one cycle h G and merge it with 
d. The resulting cycle has length at least k. If possible, we choose b such that b contains 
an edge of weight two and break this edge. 

Analysis. The algorithm runs in polynomial time. On a unit-cost RAM (with all used 
numbers bounded by a polynomial in n), the 2-cycle cover and the bipartite matching 
can be computed in time see e.g. Papadimitriou and Steiglitz lO . The decom- 

position of F and the cycle patching can be done in time 0{n?) in a straight forward 
manner. Thus, the overall running time is 

We proceed with estimating the approximation performance. 

Lemma 3. For n > 12, the k-cycle cover produced by the algorithm has weight no 
worse than 4/3 times the weight of an optimal k-cycle cover. 

Proof. Let Copt be an optimal fc-cycle cover of G. Since Copt is also a 2-cycle cover of 
C, we have w{C) < w(Copt) = n-\- 1. The cost of the fc-cycle cover produced by the 
algorithm is w{C) plus the extra costs due to the mergings. 
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First, when merging the cycles in to form the cycle d, we incur an extra cost of 
one for each c G 

Next, we consider the merging as shown in Fig. 0 We charge the costs to and 
the nodes of c^. These are at least three nodes. Since the edge of F has weight one, the 
cost of this merging is at most 1/3 per node involved. The merging of c with d is free of 
costs, since we only break edges we have already paid for when forming c and d. 

In the case depicted in Fig.0 we charge the costs of the merging to the nodes of the 
three cycles. These are at least six nodes. Altogether, the cost of this merging is again at 
most 1/3 per node involved. As above, the merging with d is free of costs. 

It is clear that each node is only charged once this way. For the moment, assume that 
the cycle d has length at least k, thus an additional merging is not needed. Let ri 2 be 
the total number of nodes contained in the cycles from C;^^. The weight tn(C'apx) of the 
fc-cycle cover Capx produced by the algorithm is at most 

w(Capx) < n + £+ |(n-n 2 ) + (1) 

We have ri 2 > 2 • ICj^^l and, by LemmalD < £. Hence w(Capx) < §(n + £)■ 

If d has length strictly less than k, then one additional merging is needed. This yields 
an approximation ratio of 4/3 + e for any e > 0. We can get rid of the e by refining the 
analysis as follows. Either the merging process of 6 G and d is free of costs, since b 
contains an edge of weight two, or all cycles in consist solely of weight one edges. 
Since is nonempty, these are at least n/2 edges. The cycle d contains at least half of 
the original edges of the merged cycles. Hence d and the cycles in contain at least 
a fraction of 3 /4 of the edges of the 2-cycle cover C. Thus, after the last merging step, 
we have a cycle cover of weight at most \ {n + t) + 1 < ^{n + £) for n > 12. □ 

Theorem 1. There is a factor A/ S approximation algorithm for Min-/c-DCC running in 
time for any k > 3. □ 

Corollary 1. There is a factor 4/3 approximation algorithm for the asymmetric travel- 
ling salesperson problem with distances one and two running in time □ 

Corollary 2. There is a factor 2/3 approximation algorithm for the node-disjoint path 
packing problem in directed graphs running in time 

Proof We transform a given directed graph G into a complete loopless directed graph 
iJ with edge weights one and two by assigning edges of G weight one and “nonedges” 
weight two. The details are spelled out by Vishwanathan il9.!l Sect. 2]. □ 

3 Approximation Algorithms for Undirected Cycle Covers 

We outline factor 7 /6 approximation algorithms for Min-/c-UCC for any fc > 5. In par- 
ticular, we recover the factor 7/6 approximation algorithm for the symmetric travelling 
salesperson problem with distances one and two of Papdimitriou and Yannakakis O 
Thm. 2] by choosing k > n/2. The algorithm is quite similar to the directed case, so we 
confine ourselves to pointing out the differences. 
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Computing an optimal 4-cycle cover. Instead of starting with a minimum weight 2- 
cycle cover, we exploit Hartvigsen’s polynomial time algorithm |Bt| for computing a 
minimum weight 4-cycle cover C. This gives us the inequality ri 2 > 4 • |Cj^q| (instead 
of ri 2 > 2 • |C;^o| in the directed case). 

Auxiliary edges. A little more care is necessary when collecting auxiliary edges via the 
matching in B, since for each weight two edge, we may now only spend an extra amount 
of 1/6 instead of 1/3. We normalize the computed 4-cycle cover C as follows: first we 
may assume that there is only one cycle t with weight two edges, since we may merge 
two such cycles without any costs. Second, we may assume that for each weight two 
edge {u, u} of t there is no weight one edge {v, x} in G for some node a: of a different 
cycle, because otherwise we may merge this cycle with t at no costs. We now may bound 
the number of nodes for which we have to charge extra costs of 1/6 in (10 by n — ri 2 — ^ 
instead of n — 7i2. This is due to the fact that if t is the root of a tree of height one 
according to the decomposition of Lemma I3 then at least i nodes of t are unmatched 
because of the second above mentioned property of t. Altogether, the total weight is 
li'(C'apx) < i^+ ^ + 



Decomposition of functions. The decomposition according to Lemma El works without 
any changes in the undirected case. 

A refined patching procedure. Here we use the patching procedure as devised by Pa- 
padimitriou and Yannakakis ITtI Fig. 2], which is only suited for the undirected case. 
Together with the fact that each involved circle has at least four nodes (instead of two in 
the directed case) we obtain lower the merging costs of 1/6 per node. 

Applying the above mentioned modifications to our algorithm for the directed case, 
we get the following theorem. 

Theorem 2. For any k > 5, there is a polynomial time factor 7 /6 approximation algo- 
rithm for Min-k-\JCC. □ 



4 Lower Bounds 

4.1 A/’'P-Completeness of 3-DCC 

To show the A/^7^-completeness of 3-DCC we will reduce 3-Dimensional Matching 
(3DM) to this problem. Consider a hypergraph H — {W, X) with W — Wq U Wi U W 2 
and X C Wq x W\ x 1^2- The sets Wq, W\, W 2 are disjoint and of the same size, 
Wk = : ■ • • j 3DM is the question whether there exists a subset X' C X such 

that each element of W appears in exactly one element of X' (perfect 3-dimensional 
matching). 3DM is known to be A/^P-complete (Garey, Johnson 151). 

We construct a graph G = (V,E) such that G has a 3 -cycle cover iff H has a perfect 
3-dimensional matching. Let = (Wq x W\) U {W\ x W2) U {W2 x Wq). For 
fc = 0, 1, 2 and j = 1, . . . , n let I/j' = g] | i = 1, . . . , n — 1 A g = 1, 2, 3} be a 
set of helper nodes for w^. The set of nodes V is given hyV = U (Ufe=o Uj=i ^j) ■ 



1 ] 

u’^[i,2] 

u^[i,3] 
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Fig. 4 . The subgraph connecting and ) via three helper nodes. 



For each edge (w^, wl,w^) G X we construct three edges ((w°, wl), {wl,w^)), 
{^{wl,w^),{wl,w^)), [{w^,w°),{w'^,wD) G E connecting the corresponding ele- 
ments of . Furthermore, two nodes (tc^, and {w^, 

connected via helper nodes as shown in Fig.0 In the following we write k + 1 instead 
of {k+ 1) mod 3 for short. 

We divide the set into subsets E^ = { (w^, |f=l,...,n}. The subset 

contains the nodes that represent Wj . 

Assume that G has a 3-cycle cover C. We call a helper node Uj [1, 2] and a node of 
companions if they are part of the same cycle in C. Due to the construction either 
or is the only companion of Uj[£,2], Hence, the following 

lemma holds. 

Lemma 4. Assume G has a 3-cycle cover G. For any Wj G V exactly n — 1 of the n 
nodes in F^ have a companion. □ 

We say that the only node {wj, G F^ that has no companion participates for 

Wj. Now we are prepared to prove the A/^T’-completeness of 3-DCC. 

Theorem 3. 3-DCC is MV -complete. 

Proof. Given a hypergraph FI we construct a graph G as described above. 

Assume H has a 3-dimensional matching X' C X. Then G has the following 3-cycle 
cover. For any G X' let wl), m^), and participate for 

wl, and These three nodes form a cycle of length 3. Let {w^, w^~^^) be the 
node that participates for w^. Then for £' < £ the nodes Uj[£' , 2] and w^i^^) and 
for £' > £ the nodes [£' — 1, 2] and {w^, companions. Thus, for £' < £ the 

nodes {w^, and v!j[£' , q] (q = 1, 2, 3) and for £' > £ the nodes and 

Uj[£' — l,q] (q = 1, 2, 3) form cycles each of length 4. Thus all nodes of G are covered 
by a cycle of length at least 3. 

On the other hand assume that G has a 3-cycle cover G. Due to Lemma01we only 
have to take care for the participating nodes in . The participating nodes form cycles 
whose lengths are multiples of 3. We cut all the edges from VFi x W 2 to W 2 x Wq. The 
remaining paths of lengths two yield a 3-dimensional matching for H. 

Noting that 3-DCC is in ffV completes the proof. □ 

By replacing the nodes of by paths and extending the helper node constructions 
we obtain the following generalization. 

Theorem 4. The problem fc-DCC is MV -complete for any k > 3. □ 
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Fig. 5. The clause gadget Gi if d consists of (a) three, (h) two, or (c) one literal. The dashed, 
dotted, and dash-dotted edges correspond to the first, second, and third literal of a, respectively. 



4.2 Nonapproximability of Min-4-DCC 

In this section we show that Min-4-DCC does not have a polynomial time approximation 
scheme (PTAS, see e.g. Ausiello et al. 0 ), unless J\fV = V. For this purpose we reduce 
Max-3SAT(3) to this problem. An instance of Max-3SAT is a set F of disjunctive 
clauses where each clause consists of at most three literals. Max-3SAT is the problem of 
finding the maximum number of simultaneously satisfiable clauses. Max-3SAT(3) is the 
restricted version where each variable occurs at most three times in F. We may assume 
that each variable occurs at least once positive and at least once negative. Otherwise, 
we can eliminate this variable by setting it to the appropriate value. In particular, each 
variable occurs twice or three times. Papadimitriou and Yannakakis | | 1 6) 1 have shown that 
Max-3SAT(3) is MAX 5A/^7^-complete. They have presented a reduction from Max- 
3SAT to Max-3SAT(3) using so called regular expanders (see e.g. Ajtai flD). A set F 
of clauses will be called p-satisfiable iff p • |F| is the maximum number of satisfiable 
clauses in F. Hastad m has proven that it is AfV-haid to distinguish 1- and (7 /8 + e)- 
satisfiable instances of Max-3SAT for any e > 0. The reduction of Papadimitriou and 
Yannakakis and the result of Hastad yield the following lemma. 

Lemma 5. There exists a constant A < 1 such that it is AfV-hard to distinguish 1- and 
X-satisfiable instances of Max-3SAT(3). □ 

We reduce Max-3SAT(3) to Min-4-DCC. For this purpose, let F = {ci, . . . , c*} 
be a set of disjunctive clauses over variables U = {x\, . . . , Xr}- We construct a graph 
G = (y, E). For each variable Xj we have one node Uj G V. These nodes will be 
called variable nodes. For each clause Ci we have three nodes fi,!, t^i ,3 G Let 

Vi = be the set of these nodes. 

In the following we describe how the nodes of G are connected via edges with weight 
one. All other edges have weight two. The nodes in Vj are connected via a cycle as shown 
in Fig. El The subgraph induced by Vi will be called the clause gadget Gi. The clause 
gadgets and the variable nodes are connected as follows. Each variable node Uj has two 
incoming and two outgoing edges e™_|_, e°'^^ representing the literal Xj, and e‘”_, e°'^f 
representing Xj as depicted in Fig.Eb- If G is the first clause where the literal Xj appears 
in, then the edge is identical with either /j™ , or /‘g depending on where Xj 
occurs in c^. If a literal occurs in more than one clause then one of the outgoing edges 
of the first gadget and one of the incoming edges of the second gadget are identical 
according to where the literal appears in these clauses. If Ci is the last clause where 
the literal Xj appears in, then the edge e™_|_ is identical with either f°'f, f° 2 ^, or /°g* 
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(a) 




(b) 






C3 



Fig. 6 . (a) The edges starting and ending at Uj . (b) Example: a;i is the first literal in both ci and 
C 2 , and xi is the second literal in C 3 . 



depending on where Xj occurs in q. The clauses which contain Xj are connected in a 
similar fashion. An example is shown in Figure0b. 

Let C = (V, Eq) be a cycle cover of G. We introduce a weight for subsets 

V C V of nodes. For a single node z G V the weight iyc({z}) of z with respect to the 
cycle cover C is half of the weight of its incoming edge plus half of the weight of its 
outgoing edge. For F C we have vc{V) = Then = w{C). 

Let an assignment for the variables U be given that satisfies k of the t clauses of 
F. We construct a 4-cycle cover C with weight w{C) = r + 4- < — fcas follows. If 
Xj = true then C Ec, otherwise e™_, e°'^^ G Eq- If the j-th literal of ct is 

true then /“■ , G Ec - For any satisfied clause q add some of the edges (vi^i , , 

and (ui_ 3 , Vi^i) if necessary. The clauses that are not satisfied are connected 
in one big cycle. This yields weight 4 per unsatisfied clause. Every node in C has indegree 
1 and outdegree 1. Hence C is a cycle cover. If a clause Ci is satisfied by only one literal, 
then the cycle passing Gi contains two edges within Gi and both one incoming and one 
outgoing edge. If a clause is satisfied by more than one literal, then the cycle passes 
at least two variable nodes. Thus, the obtained cycle cover is a 4-cycle cover. The case 
in which only one clause is not satisfied by the assignment is a special one since then 
we cannot form one big loop of the unsatisfied clauses. But this case is negligible for 
sufficiently large instances of Max-3SAT(3), since we are interested in approximability. 

For the other direction consider a 4-cycle cover C of G and a clause gadget Gi 
with VC {Vi) = 3. Such a gadget will be called satisfying. Any clause gadget that is not 
satisfying yields weight at least 7/2. It holds that Gi is satisfying iff all edges that start 
or end in Gi have weight one. Since all cycles must have at least length 4 we have the 
following lemma. 

Lemma 6 . Let G = (V, Ec) be an arbitrary 4-cycle cover ofG and Gi be a satisfying 
clause gadget in G. Then the following properties hold: 

1. At least two of the edges fl\, fi'f, ■ ■ ■ , /° 3 * are in Ec- 

2. Forj = 1, 2, 3.- /“ e E' O G E’’ □ 

A satisfying clause gadget Gi yields a partial assignment for the variables of Ci- If 
/“ , G E' then the j-th literal of Ci is set true and the corresponding variable is 
assigned an appropriate value, otherwise we do not assign a value to this literal. Due to 
Lemma 0the obtained partial assignment satisfies c^. 

By considering all satisfying clause gadgets we step by step obtain a partial assign- 
ment that satisfies at least those clauses whose gadgets are satisfying. The following 
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lemma assures that the obtained partial assignment is consistent, i.e. we never have to 
assign both true and false to one variable. 

Lemma 7. Let C be an arbitrary A-cycle cover ofG. There are no two satisfying clause 
gadgets Gi and Gi> and a variable Xj such that Xj has to be assigned different values 
according to these clause gadgets. □ 

Proof. If Xj has to be assigned different values then Xj occurs positive in Ci and negative 
in Ci' or vice versa. We only consider the first case. By symmetry, we can restrict ourselves 
to the case where the literal Xj occurs exactly once in F and that Xj is the first variable 
in both clauses. According to Gi' the literal Xj has to be assigned true. Since Ci' is the 
only clause that contains Xj, the two edges e™_ = f°Pl and e°'^f = belong to Ec- 
On the other hand, f°'f and fl\ belong to Ec. Since Xj occurs at most twice positive 
in E at least one of the edges f°'f and ff\ connects Vi to uj . Thus, Uj has indegree at 
least 2 or outdegree at least 2, a contradiction. □ 

Each variable node yields at least weight 1. Each satisfying clause gadget yields 
weight 3. All other clause gadgets yield at least weight 7 /2. 

The following theorem proves that a constant ^ > 1 exists such that Min-4-DCC 
cannot be approximated in polynomial time with performance ratio f, unless AfV = V. 
Thus, Min-4-DCC does not have a PTAS, unless NV — V. 

Theorem 5. There exists a constant ^ > 1 such that it is ffV-hard to distinguish 
instances G = (V, E) of Min-4-DCC whose minimum cycle cover has weight \V\ and 
instances whose minimum cycle cover has at least weight ^ ■ \ V\. 

Proof. Due to the reduction described above, a 1-satisfiable instance of Max-3SAT(3) 
yields a graph which has a 4-cycle cover with weight \V\ = r + 3 ■ t. On the other 
hand, every 4-cycle cover of a graph corresponding to a A-satisfiable instance has at 
least weight r-|-3-A-f-|-(7/2)-(l — A)-f = K\. Since every clause consists of at most 
three literals and every variable appears at least twice we have r/t < 3/2. Therefore, 
the following inequality holds: 

Kx ^ (3/2 + 3-A + (7/2)-(l-A))-f 10 ~ A _ 

r + 3-t~ (3/2 + 3) -t 9 ^ 

Thus, deciding whether the minimum 4-cycle cover of a graph has weight \V\ or at 
least weight ^ • |L| is at least as hard as distinguishing 1- and A-satisfiable instances of 
Max-3SAT(3). This completes the proof due to Lemma|3 □ 

If we replace the variable nodes by paths of lengths A: — 4 we obtain the result that 
Min-Zc-DCC does not have a PTAS for any /c > 4. 

Theorem 6. For any fc > 4 there exists a constant > 1 such that Min-fc-DCC cannot 
be approximated with performance ratio fk, unless ffV = V. □ 

We can transform a directed graph into an undirected graph by replacing each node 
with three nodes (see e.g. Hopcroft and Ullman Da)- Applying this transformation to 
the graph constructed in this section we obtain the following theorem. 

Theorem 7. Min-A:-UCC does not have a PTAS for any k > 12, unless NV = V. □ 
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5 Conclusions and Open Problems 

We have presented factor 4/3 and 7/6 approximation algorithms for Min-fc-DCC and 
Min-Zc-UCC, respectively, with polynomial running time (independent of k). On the 
other hand, we have shown that fc-DCC is A/^P-complete for fc > 3 and Min-fc-DCC 
does not possess a PTAS for fc > 4, unless J\fP = V. The status of Min-3-DCC is 
open. We strongly conjecture that this problem also has no PTAS, unless J\fV = V. In 
the undirected case, Papadimitriou has shown A/^P-hardness of fc-UCC for k > 6. The 
complexity of 5-UCC and the approximahility of Min-fc-UCC for 5 < fc < 11 remains 
open. 
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Abstract. The cutwidth of a graph G is defined as the smallest inte- 
ger k such that the vertices of G can be arranged in a vertex order- 
ing [wi,... ,Vn] in a way that, for every i = 1,... ,n — 1, there are 
at most k edges with the one endpoint in {ui,... ,Vi\ and the other 
in {ui+i, . . . , Vn}- We examine the problem of computing in polynomial 
time the cutwidth of a partial w-tree with bounded degree. In particular, 
we show how to construct an algorithm that, in n®*-™ steps, computes 
the cutwidth of any partial lu-tree with vertices of degree bounded by a 
fixed constant d. Our algorithm is constructive in the sense that it can 
be adapted to output the corresponding optimal vertex ordering. Also, 
it is the main subroutine of an algorithm computing the pathwidth of a 
bounded degree partial w-tree in ^ steps. 



1 Introduction 

A wide variety of optimization problems can be formulated as layout or vertex 
ordering problems. In many cases, such a problem asks for the optimal value of 
some function defined over all the linear orderings of the vertices or the edges of 
a graph (for a survey, see H). One of the best known problems of this type is 
the problem to compute the cutwidth of a graph. It is also known as the Min- 
imum Cut Linear Arrangement problem and has several applications such 
as VLSI design, network reliability, automatic graph drawing, and information 
retrieval. Cutwidth has been extensively examined and it appears to be closely 
related with other graph parameters like pathwidth, linear-width, bandwidth, 
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der contract number IST-99-14186 (ALCOM-FT) and, for the first two authors, by 
the Spanish CYCIT TIC-2000-1970-CE. The work of the first author was parti- 
ally supported by the Ministry of Education and Culture of Spain, Grant number 
MEC-DGES SB98 0K148809. 
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and modified bandwidth. Briefly, the cutwidth of a graph G = (V(G), E(G)) is 
equal to the minimum k for which there exists a vertex ordering of G such that 
for any ‘gap’ (place between two successive vertices) of the ordering, there are 
at most k edges crossing the gap. Computing cutwidth is an NP-complete prob- 
lem and it remains NP-complete even if the input is restricted to planar graphs 
with maximum degree 3. There is a polynomial time approximation algorithm 
with a ratio of 0(log |P(G)| log log |P(G)|) and there is a polynomial time ap- 
proximation scheme if E{G) = 0(|P(G)p). Relatively few work has been done 
on detecting special graph classes where computing cutwidth can be done in 
polynomial time (for complete references to the aforementioned results, see Pj). 
In [B| , an algorithm was given that computes the cutwidth of any tree with max- 
imum degree bounded by d in 0{n{logn)‘^~'^) time. This result was improved in 
1983 by Yannakakis ini, who presented an 0(n log n) algorithm computing the 
cutwidth of any tree. Since then, the only polynomial algorithms reported for 
the cutwidth of graph classes different from trees, concerned special cases such 
as hypercubes P2| and b-dimensional c-ary cliques H3| In this paper, we move 
one step further presenting an polynomial time algorithm for the cutwidth of 
bounded degree graphs with small treewidth. 

The notions of treewidth and pathwidth appear to play a central role in many 
areas of graph theory. Roughly, a graph has small treewidth if it can be con- 
structed by assembling small graphs together in a tree structure, namely a tree 
decomposition of small width (graphs with treewidth at most w are alternatively 
called partial w-trees - see Subsection 12 . 1 I for the formal definitions). A big va- 
riety of graph classes appear to have small treewidth, such as trees, outerplanar 
graphs, series parallel graphs, and Halin graphs (for a detailed survey of classes 
with bounded treewidth, see ^). The pathwidth of a graph is defined similarly 
to treewidth, but not the tree in its definition is required to be a simple line 
(path). That way, treewidth can be seen as a “tree” -generalization of pathwidth. 
Pathwidth and treewidth were introduced by Robertson and Seymour in uni 
ITC] and served as some of the cornerstones of their lengthy proof of the Wagner 
conjecture, known now as the Graph Minors Theorem (for a survey see jl YjV 
Treewidth appears to have interesting applications in algorithmic graph theory. 
In particular, a wide range of otherwise intractable combinatorial problems are 
polynomially, even linearly, solvable when restricted to graphs with bounded 
treewidth or pathwidth. In this direction, numerous techniques have been devel- 
oped in order to construct dynamic programming algorithms making use of the 
“tree” or “line” structure of the input graph (see e.g. 0). The results of this 
paper show how these techniques can be used for constructing a polynomial time 
algorithm for the cutwidth of partial w-trees with vertices of degrees bounded 
by fixed constants. Our algorithm is a non trivial extension of the linear time 
algorithm in m concerning the parameterized version of the cutwidth problem. 

The parameterized version of the cutwidth problem asks whether the 
cutwidth of a graph is at most k, where fc is a fixed small constant. This problem 
is known to be solvable in polynomial time. In particular, the first polynomial 
algorithm for k fixed was given by Makedon and Sudborough in uni where a 
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0{n^ dynamic programming algorithm is described. This time complexity 
has been considerably improved by Fellows and Langston in E! where, among 
others, they prove that, for any fixed k, an 0{n^) algorithm can be constructed 
checking whether a graph has cutwidth at most k. Furthermore, a technique in- 
troduced in m (see also 0) further reduced the bound to O(n^), while in P a 
general method is given to construct a linear time algorithm that decides whether 
a given graph has cutwidth at most k, for k constant. Finally, in m, an explicit 
constructive linear time algorithm was presented able to output the optimal ver- 
tex ordering in case of a positive answer. This algorithm is based on the fact 
that graphs with small cutwidth have also small pathwidth and develops further 
the techniques in PUB] in order to use a bounded-width path decomposition 
for computing the cutwidth of G. 

This paper extends the algorithm in in the sense that it uses all of its 
subroutines and it solves the problem of m for graphs with bounded treewidth. 
Although this extension is not really useful for the parameterized version of 
cutwidth, it appears that it is useful for solving the general cutwidth problem 
for partial w-trees of bounded degree. This is possible due to the observation 
that the “hidden constants” of all the subroutines of our algorithm remain poly- 
nomial even when we ask whether G has cutwidth at most 0{dk) ■ logn. As this 
upper bound for cutwidth is indeed satisfied, our algorithm is able to compute 
in steps the cutwidth of bounded degree partial w-trees. 

A main technical contribution of this paper is Algorithm Join-Node in Sec- 
tion El This algorithm uses the “small treewidth” property of the input graph. 
It is used as an important subroutine in the algorithm for the main result. Sec- 
tion 0 contains the main definitions and lemmata supporting the operation of 
Join-Node. Subsection contains the definitions of treewidth, pathwidth and 
cutwidth. Most of the preliminary results of Subsection 12.21 concern operations 
on sequences of integers and the definitions of the most elementary of them were 
introduced in P2| and [S| (see also |/l 1 8j L Also, the main tool for exploiting 
the small treewidth of the input graph is the notion of the characteristic of a 
vertex ordering, introduced in uni and defined in Subsection 12.31 of this paper. 
For the above reasons, we use notation compatible with the one used in m- In 
this extended abstract we omit all the proofs that can be found in m- 

Algorithm Join-Node only helps to compute the cutwidth of a bounded degree 
partial ic-tree G but not to construct the corresponding vertex ordering. In the 
full version of this paper m. we describe how to transform this algorithm to a 
constructive one in the sense that we now can output a linear arrangement of 
G with optimal cutwidth. This uses the analogous constructions of m and the 
procedures Join-Orderings and Construct-Join-Orderings described in Section El 

An interesting consequence of our result is that the pathwidth of bounded 
degree partial w-trees can be computed in steps. We mention that the 

existence of a polynomial time algorithm for this problem, without the degree 
restriction, has been proved in 0. However, the time complexity of the involved 
algorithm appears to be very large and has not been reported. Our technique, 
described in detail in the full version ED], reduces the computation of pathwidth 



A Polynomial Time Algorithm for the Cutwidth of Bounded Degree Graphs 383 



to the problem of computing the cutwidth on hypergraphs. Then the pathwidth is 
computed using a generalization of our algorithm for hypergraphs with bounded 
treewidth. That way, we report more reasonable time bounds, provided that the 
input graph has bounded degree. 

2 Definitions and Preliminary Results 

All graphs of this paper are finite, undirected, and without loops or multiple 
edges (our results can be straightforwardly generalized to the case where the 
last restriction is altered). We denote the vertex (edge) set of a graph G by 
V{G) (E{G)). A linear ordering of the vertices of G is a bijection, mapping 
V{G) to the integers in {!,... ,n}. We denote such a vertex ordering by the 
sequence [z;i , . . . , Vn] ■ 

We proceed with a number of definitions and notations, dealing with finite 
sequences (i.e., ordered sets) of a given finite set O (most of the notation in this 
paper is taken from m and 0). For our purposes, O can be a set of numbers, 
sequences of numbers, vertices, or vertex sets. The set of sequences of elements 
of O is denoted by O*. Let be a sequence of elements from O. We use the 
notation [wi,... ,ujr] to represent uj and we define oj[i,j] as the subsequence 
[uJi, . . . , Wj] of UJ (in case j < i, the result is the empty subsequence [ ]). We also 
denote by oj(i) the element of u indexed by i. 

Given a set S containing elements of O, and a sequence to, we denote by 
a; [S'] the subsequence of uj that contains only those elements of to that are in S. 
Given two sequences from O * , where w* = [uj\, . . . , , * = 1 , 2 we define 

the concatenation of and W 2 as = [uj\, . . . ,uj^^,uJi, . . . , Unless 

mentioned otherwise, we always consider that the first element of a sequence to 
is indexed by 1, i.e. uj = o->[l, jwj]. 

Let G be a graph and S C V{G). We call the graph (S, E{G)C\{{x,y} \ x,y € 
S}) the subgraph of G induced by S and we denote it by G[Sj. We denote by 
Ec{S) the set of edges of G that have an endpoint in S; we also set Eq{v) = 
Gg({?^}) for any vertex w. If if C E{G) then we denote by Vg{E) the set of all 
the endpoints of the edges in E i.e. we set Vg{E) = UegBC. The neighborhood 
of a vertex v in graph G is the set of vertices in G that are adjacent to u in G 
and we denote it by Ng{v), i.e. Ng{v) = Vg{Eg{v)) — {u}. If Z is a sequence of 
vertices, we denote the set of its vertices by U(Z). If S' C V{1) then we define l[S] 
as the subsequence of I containing only the vertices of S. If Z is a sequence of all 
the vertices of G without repetitions, then we call it an vertex ordering of G. 



2.1 Treewidth Pathwidth — Cutwidth 

A nice tree decomposition of a graph G is a triple {X, U, r) where Z7 is a tree 
rooted on r whose vertices we call nodes and X = ({W | i S V{U)}) is a 
collection of subsets of V{G) such that 1. Uiev((7) ~ P(^)> 2. for each edge 
{u,w} G E{G), there is an i G V{U) such that v,w £ Xi, and 3. for each v G 
V (G) the set of nodes {i \ v £ Xi} forms a subtree of U, 4. every node of U has at 
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most two children, 5. if a node i has two children j, h then Xi = Xj = Xh, 6. if a 
node i has one child j, then either \Xi\ = | + 1 and Xj C Xi or \Xi \ = | — 1 

and Xi C Xj. The width of a tree decomposition {{Xi \ i € V{U)},U) equals 
maXjgy(( 7 ){|Xi| — 1}. The treewidth of a graph G is the minimum width over 
all tree decompositions of G. According to m, it is possible, for any constant 
k > 1, to construct a linear time algorithm that for any G outputs - if exists - 
a nice tree decomposition of G of width < k and with at most 0(|t4(G)|) nodes. 

A nice tree decomposition {{Xi \ i G V{U)},U,r) contains nodes of the 
following four possible types. A node i G V{U) is called “start” if i is a leaf of 
[/, if i has two children, ^‘‘forgeG if i has only one child j and \Xi\ < \Xj\, 

“introducd^ if i has only one child j and \Xi\ > \Xj\. We may also assume that 
if t is a start node then \Xi\ = 1: the effect of start nodes with \Xi\ > 1 can be 
obtained by using a start node with a set containing 1 vertex, and then |Aj| — 1 
introduce nodes, which add all the other vertices. 

Let D — {X, U, r) be a nice tree decomposition of a graph G. For each node 
i of U, let Ui be the subtree of U, rooted at node i. For any i G V{U), we 
set Vi = and Gi = G\Vi\. For any p G V{U) we refine Gp in a 

top down fashion as follows. If q is a join with children p and p', select one 
of its two children, say p. Then, for any i G Up remove from Gi any edge in 
the set E{Gq[Xq]) (in fact, any partition of E{Gq[Xq\) for the edges induced 
by Gp[Xp\ and Gpi[Xpi] would be suitable for the purposes of this paper). In 
this construction, we have V{Gp) = Vp for any p G V{U) and we guarantee 
that if q is a join node with children p and p' then V{Gp) = V{Gp') = V{Gg), 
E{Gp[Xq]) n E{G'p[Xq]) = and E{Gp) U A(Gp/) = E{Gq). Notice that if r is 
the root of U , then Gr = G. We call Gi the subgraph of G rooted at i. We finally 
set, for any i G V{U), Di = {X\Ui,i) where A® = {A„ | v G V{Ui){. Observe 
that for each node i G V {U) , Di is a tree decomposition of Gi. 

A tree decomposition {X, U) is a path decomposition, if G is a path. The 
pathwidth of a graph G is defined analogously. 

The cutwidth of a graph G with n vertices is defined as follows. Let I = 
[ui, . . . , Vn] be a vertex ordering of V (G). For i = 1, . . . , n—1, we define 0i,G{i) = 
Ag(Z[1,j]) n EG{l[i + l,n]) (i.e. 9ifi{i) is the set of edges of G that have one 
endpoint in /[I, i\ and one in + 1, n]). The cutwidth of an ordering I of G(G) is 
maxi<i<„_i{|0/^G(f)|}. The cutwidth of a graph is the minimum cutwidth over 
all the orderings of V{G). 

If Z = [ui , . . . , Vn] is a vertex ordering of a graph G, we set 

Qg.z = [[OUI^tG(l)|], • . • , [|0i.G(n - 1)|],[0]]. 

We also assume that the indices of the elements of Qa,i start from 0 and finish 
on n, i.e. Qc,i = Qg,z[0j ''t-]- Clearly, Qg,i is a sequence of sequences of numbers 
each containing only one element. It is not hard to prove that any graph G with 
treewidth at most w and maximum degree at most d, has cutwidth bounded by 
(u;+I)dlog|l/(G)|. 
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2.2 Sequences of Integers 

In this section, we give a number of preliminary results on sequences of integers. 

We denote the set of all sequences of non-negative integers by S. For any 
sequence A = [ai, . . . , a|yi|] € S and any integer t > 0 we set A + t = [oi -I- 
t,. . . , a\A\ + i]- li A,B G S and A = [ai, . . . , a|A|], we say that A IZ B if B is 
a subsequence of A obtained after applying a number of times (possibly none) 
the following operations: (i) If for some i, I < i < \A\ — 1 , ai = Oi+i, then set 
A t— A[l,f] © A[i + 2, |A|]. (ii) If the sequence contains two elements Oj and aj 
such that j — i > 2 and 

ai <ak < aj or f/i<k<j ai > ak > aj, then set A G- A[l,i] ® A[j, |A|], 

We define the compression t{A) of a sequence A G 5, as the unique mini- 
mum length element of {B \ B C A}. Notice that B = t{A) is a subsequence 
[oij, . . . , ai|j3|] of A = [ai, . . . , a|^|] such that for any j, 1 < j < |i?| — 1, either 
oq < ©,+1 < • • • < aq+i-i < ©,+1 or > Ui^+i > ■ ■ ■ > ©,,+1-1 > We 

can now define a function /3 a : { 1 , • ■ • , |t(A)|} — >■ { 1, . . . , |A|} where PaU) = ij 
is one of the possible original positions in A of the j-th element in r(A). Anal- 
ogously, we define the function |A|} — >• {!,... ,t(A)} such that 

i® the unique i such that there exists a function Pa where PA{i) = j- 
For any A G 5, we define a(A) in the same way as t(A) with the difference 
that only operation (i) is considered, i.e., we remove repetitions of a number on 
successive positions in the sequence. If now A is a typical sequence, we define 
the set of extensions of A as S{A) = {A G 5 | a(A) = A}. Let A = [oi, . . . , UrJ 
and B = [61, . . . , 6^2] be two sequences in S. We say that A < B if ri = T2 
and Vi<i<n ai < bi. In general, we say that A ^ i? if there exist extensions 
A G £(A), and B G S{B) such that A < B. Suppose now that A = [Ai, . . . , A^] 
and B = [Bi,... ,B|r|] are two sequences of typical sequences. We say that 
A ^ B if Vi<i<r Ai G Bi. For any integer t we set A + 1 = [Ai +t,... , A| a| + 1 ] 
and max(A) = maxi<i<|A|{max A^}. Finally, for any sequence of typical se- 
quences A we set r(Ay = t(A( 1) © • • • © A(|A|)). 

Let two equal-size sequences A,B of S where A = [oi, . . . , Or], S = 
[&i , . . . ,br\. We define A + B = [a\ + b\, . . . , + 6r] and we say that A ~ i? iff 

Vi<2<7. ai yf Qi-^-i "4=^ bi — bi-i-i (and, therefore, bi ai — Uj^_i). 

The interleaving A © i? of two typical sequences A and S is a set of typical 
sequences defined as follows 

A®B = {r(A + B) I A G f (A), B G E{B) and, A ~ B}. 

The interleaving of two sequences of typical sequence A = [Ai,... ,A„,] and 
B = [Bi , . . . , Bw] where w = |A| = |B| is defined as follows: 

A(^B = {[Cl) • ■ • ) Cyj] \ Ci G Ai ® Bi, /=!,... , ic}. 

Given two sequences B\ and B2 where B\ ~ B2 we define function • 

{!,... ,\Bi\ -!}-)> {1,2}, = 1 if ^i(j) ^ Bi{j + 1) and n{j) = 2 

if i?i(j) = Bi{j + 1) indicates which one of B\,B2 changes value 

between indexes j and j + 1). When the sequences B\ and B2 are obvious, we 
simply denote vbi,B2 by v. 




386 



D.M. Thilikos, M.J. Serna, and H.L. Bodlaender 



2.3 Characteristic Pairs 

A characteristic pair is any pair (A, A) where A is a sequence over a set O and 
A is a sequence of typical sequences such that |A| = |A| + 1. Notice that for any 
graph G and any order I of V{G) the pair (Z, Qg,i) is a characteristic pair. The 
width of a characteristic pair (A, A), is defined as max(A). 

Procedure Com in Figure D defines the compression of a characteristic pair 
relative to a subset of O. 

Procedure Com(Z,R, S'). 

Input: A characteristic pair (Z, R) and a set S. 

Output: A characteristic pair (A, A). 

We assume the notations Z = [n, . . . , U|j|] and A = [vi^ ,Vi^, . . . , Uip]. 

1: A^Z[S]. 

2: A ^ [r(R[0,ii - 1]), t(R[ii, - 1]), ■ . . ,r(R[zp_i, ip - l]),r(R[ip, |Z|])]. 

3: Output (A, A). 

4: End. 

Fig. 1. The procedure Com. 

Given a graph G with n vertices, a vertex ordering I of G and S C V{G), 
the S- characteristic of I is Cs{G,l) = Com(Z, Qg,j, 5). Notice that, from the 
definition of the S-characteristic of a vertex ordering I of a graph G we have 
that the P (G) -characteristic of I is equal to (Z, Qg,;), i.e. Gy(G)(G, Z) = (Z, Qg,i) 
(clearly, Com(Z, Qg,;, P(G)) = (Z,Qg,;))- We will simply use the term char- 
acteristic when there is no confusion on the choice of S and Z. Given the S- 
characteristics (A*,A®),i = 1,2, of two different vertex orderings of G we say 
that (A^,A^) ^ (A^, A^) when A^ = A^ and A^ ^ A^. 

Given a graph G and a vertex subset S, we say that a characteristic pair 
(A, A) is an S- characteristic when (A, A) = Gs{l,G) for some ordering Z of the 
vertices of G. Notice that for any S C P(G), Z is a vertex ordering of G with 
width at most k iff the width of Gs(Z,G) is at most k. 

Assume from now on that we have a graph G and that (A, U) is a nice tree 
decomposition of G, with width at most w. A set FS{i) of Ai-characteristics 
of vertex orderings of the graph Gi with cutwidth at most k is called a full set 
of characteristics for node i if for each vertex ordering Z of Gi with cutwidth at 
most k, there is a vertex ordering Z' of Gi such that Gxi {Gi, I') -< Cx^ {Gi, 1) and 
Cxi{Gi,l') € FS{i), i.e. the A^-characteristic of Z' is in FS{i). The following 
lemma can be derived directly from the definitions. For the proof of Lemma El 
see (see |E]]- 

Lemma 1. A full set of characteristics for a node i is non-empty if and only 
if the cutwidth of Gi is at most k. If some full set of characteristics for i is 
non-empty, then any full set of characteristics for i Gi is non-empty. 

Lemma 2. Let G be graph with n vertices of degree bounded by d and let (A, U) 
be a nice tree decomposition of G with width at most w. Then for any node 
i G V{U), |PS'(Z)| < w! (|)“+^ n2d(w+i)\ 
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3 An Algorithm for Cutwidth and Its Consequences 



In this section, we give for any pair of integer constants k, w, an algorithm that, 
given a graph G with maximum degree d and a nice tree decomposition (X, U) 
of width at most w, decides whether G has cutwidth at most k. 

An important consequence of Lemma Q is that the cutwidth of G is at most 
k, if and only if any full set of characteristics for the root r is non-empty (recall 
that Gr = G). In there are given algorithms able to construct a full set of 
characteristics for an insert or a forget node when a full set of characteristics for 
the unique child of i is given. In what follows, we show how to compute a full 
set of characteristics for a join node i when two full set of characteristics for its 
children ji,j 2 are given. 

We now consider the case that node i is a join node and jn,h= 1,2 are the 
two children of i in U. We observe that V{GjJ fl V{Gj^) = Xi, Gj^ U Gj^ = Gi 
and we recall that E{Gj^ \Xj\)r\E{G [^j]) = Given a full set of characteristics 
FS{ji) for ji and a full set of characteristics Ej^ for j 2 , tbe algorithm Join-Node 
in Figure El computes a full set of characteristics ES{i) for i. 



Algorithm Join-Node 

Input: A full set of characteristics FS{ji) for ji and 
a full set of characteristics FS{j 2 ) for jh- 
Output: A full set of characteristics FS{i) for i. 

1 : Initialize FS(i) = 0. 

2: For any pair of -characteristics (A, Ah) G FS{jh), h = 1,2, do 

3: For any A G Ai(^A 2 , do 

4: If max(A, A) < k, set FS{p) ^ FS{p) U {(A, A)}. 

5: Output FS{p). 

6: End. 



Fig. 2. The algorithmJoin-Node. 



Lemma 3. Let G, Gi and G 2 be graphs where G 1 UG 2 = G and Gi(lG 2 = {S, 0). 
Let also h, I 2 be vertex orderings of G\ and G 2 respectively where Zi[5'] = ^2[5'] = 
A. If {Cs{Gi,li) = X,Ai),i =1,2 then, for any A G Ai0A2, there exists a 
vertex ordering I of G where l[V{Gi)] = U,i = 1,2 and Gs{G,l) = (A, A) and 
such a vertex ordering can be constructed by procedure Construct-Join-Orderings 
in Figure\^ 

The proof of the following makes strong use of Lemma 0 

Lemma 4. If i is a join node with children ji,j 2 , and, for h = 1,2, FS{jh) is 
a full set of characteristics for jh, then the set FS{i) constructed by Algorithm 
Join-Node in Figure\B is a full set of characteristics for i. 

Using Lemma 0 it is possible to conclude to the following. 



388 



D.M. Thilikos, M.J. Serna, and H.L. Bodlaender 



Procedure Construct-Join-Ordering(Gi , G 2 , A) . 

Input: Two graphs Gi, G 2 and a set S where Gi Pi G 2 — {S, 0). 

Two vertex orderings li and I 2 of Gi and G 2 , where li[S] — ^2[5] = 

A sequence of typical sequences A G Ai(^A 2 where (A.Ai) = Com{Gi,li, S),i 
Output: A vertex ordering Z of G where Gs(G, Z) = (A, A). 

1: Assume that for i — 1,2, let li — [v^, . . . , ] 

2: Let A = , . . . , ] = [u ^2 > • • • ^ '^^ 2 ! where p = \S\. 

«1 Kp Kp 

3: For i = 1, 2, set /^q = 0 and 1 — 1. 

4: For i ^ 1, 2, set Qi ^ Qci,li (0) © ■ - - © 

5: For any h — 0, . . . , p, 

set — Zi[/*^ + — 1 ] and 1^—12 + I 5 ~ 1 ] ■ 

set QJ- — Qi[n]^, ^i+i — 1] Q 2 — Q2[^L '^i+i ~ !]■ 
set — Join-Orderings(Z^ , Z 2 , Qi , Q 2 i A(/i)). 

6: Set Z ^ © [A(l)] © © [A(2)] © © ■ ■ - © [A(p - 1)] © © [A(p)] © w^. 



7 

8 



Output Z 
End. 



= 1 , 2 . 



Procedure Join-0 rderings(Zi , I 2 , Qi , Q 2 , -^) ■ 

Input: Two orderings li,l 2 , two sequences Qi,Q 2 where \Qi\ — |Zj| + 1,1,2, 

and a sequence A G t(Qi) © t(Q2) 

Output: An ordering Z. 

1: Compute Si, B 2 such that A = t(Bi + S 2 ), where Si ~ S 2 , and Bi G S(r(Qi)), i — 1,2. 

2: Set w — |Si| = IS 2 I, and denote 1 / — 

3: For i = 1, . . . ,u; - 1 set 0')). O') + 1)) “ !]■ 

4: Output mi © • • • © mw—i- 

5: End. 



Fig. 3. Procedures Construct-Join-Ordering and Join-Ordering. 

Theorem 1. An algorithm can be constructed that, given a graph G with n 
vertices of degree no more than d and a tree decomposition {X,U) of G ofO{n) 
nodes and width at most w, eomputes the cutwidth of G in 

steps. 

Our algorithm can be turned into a constructive algorithm by making use of 
Procedures Construct-Join-Ordering and Join-Ordering. (For the details, see m) 

Theorem 2. An algorithm can be constructed that, given a graph G with n 
vertices of degree no more than d and a tree decomposition {X, U) of G with 
0{n) nodes and width at most w, outputs an ordering of V{G) of minimum 
cutwidth in steps. 

According to the main results in m, one can construct an algorithm that, 
in n) steps, constructs a minimum width nice tree decomposition 

of any partial w-tree. This algorithm can serve as a preprocessing step to the 
algorithm of Theorem |2| that with input a partial w-tree G with vertices of degree 
at most d, outputs a vertex ordering of G of minimum cutwidth. 

Our algorithm can be used as the main subroutine of a polynomial time 
algorithm computing the pathwidth of a bounded degree partial rc-tree. We 
prove the following theorem. (For the details see m-) 

Theorem 3. An algorithm can be constructed that, given a graph G with n 
vertices of degree no more than d and a tree deeomposition {X,U) of G of 0{n) 
nodes and width at most w, outputs a path decomposition of G with minimum 
width in log n)^) steps. 
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We mention that the problem of computing the pathwidth of partial ic-trees 
can be solved in polynomial time. The algorithm for the general case was given 
by Bodlaender and Kloks in 0. However, the exponent in the complexity of this 
algorithm is quite large for any practical purpose. The algorithm proposed in 
Theorem|3is faster and can serve as a more realistic approach for partial w-trees 
with bounded degree. 
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Abstract. In this paper new and generalized lower bounds for the graph partition- 
ing problem are presented. These bounds base on the well known lower bound of 
embedding a clique into the given graph with minimal congestion. This is equiv- 
alent to a multicommodity flow problem where each vertex sends a commodity 
of size one to every other vertex. Our new bounds use arbitrary multicommodity 
flow instances for the bound calculation, the critical point for the lower bound is 
the guaranteed cut flow of the instances. Furthermore, a branch&bound procedure 
basing on these bounds is presented and finally it is shown that the new bounds 
are also useful for lower bounds on classes of graphs, e.g. the Butterfly and Benes 
graph. 



1 Introduction 

Graph Partitioning is the problem of partitioning a set of vertices of a graph into disjoint 
subsets of a given maximal size such that the number of edges with end points in different 
subsets is minimized. Graph Partitioning is a very common problem and has a large 
number of applications. For example circuit layout, compiler design, and load balancing 
are typical applications in which Graph Partitioning problems appear. Unfortunately, 
the Graph Partitioning problem is a NP-hard problem. So in the last years a lot of effort 
has been spent in the development of fast and good heuristics for the problem, a recent 
survey is given in m- These heuristics often can handle rather large graphs with more 
than a million vertices and deliver good solutions. In contrast to the development of 
heuristics only a little expense has been done in the development of exact algorithms. 
From the NP-hardness fact it is clear that generally only relatively small graphs can be 
solved exactly. Nevertheless, exact solutions are of interest for applications and for the 
validation of heuristics. 

In this paper we present new lower bounds for the Graph Partitioning problem and 
a branch & bound algorithm for the exact solution of the Graph Partitioning problem 
using these new lower bounds. The new bounds base on a well known method for proving 
lower bounds of the graph bisection problem (a special case of the graph partitioning 
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IST-1999-14186 (ALCOM-FT), and by the German Science Foundation (DFG) project SFB- 
376. 
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problem with two equally sized partitions): If there is an embedding of the Clique graph 

with n vertices {Kn) into the given graph G with a congestion C, then each bisection 

2 

of the graph G cuts at least ^ edges. The computation of an embedding with minimal 
congestion (i.e. the lower bound is maximal) is equivalent to a multicommodity flow 
problem: Every vertex sends a commodity of size one to every other vertex. Realizing 
this flow with minimal congestion provides us with the identical lower hound for the 
graph bisection problem. 

The above lower bound is often used for theoretical analysis of the bisection width 
of a given graph (e.g. see Ol) and it delivers convenient bounds if the graph is quite 
regular. But the bound is impractical for the construction of a branch & bound algorithm 
if the graph is more irregular as it is often the fact in practice. In this paper we present 
generalizations of this bound. The generalizations base on the observation that not every 
vertex has to send a commodity of identical size to every other vertex. In fact we can 
compute lower bounds on the graph partitioning problem for every possible combination 
of commodities. We only have to know the guaranteed GutFlow, i.e. the amount of flow 
which crosses the cut of every feasible partitioning of the graph. Furthermore, we have 
generalized the idea such that it is applicable not only for the graph bisection problem 
with exactly two equally sized partitions, but also for the general graph partitioning 
problem with a given number of partitions and a given maximal size for every partition. 

In the last years there have been presented a number of different approaches for 
solving the graph partitioning problem exactly. The most recent approach is presented in 
0 by S.E. Karisch, F. Rendl, and J. Clausen. They use a semidefinite relaxation for the 
computation of a lower bound on the Graph Partitioning problem as the core of a branch & 
bound algorithm. They address the Graph Partitioning problem with two partitions, edge 
weights, and a maximal partition size; their approach does not handle vertex weights 
or more than two partitions. In |51 Ferreira et al. present a branch-and-cut algorithm 
basing on a variety of separation heuristics. They address the general Graph Partitioning 
problem with vertex weights, edge weights, an arbitrary number of partitions, and a 
maximal partition size. In O L. Brunetta, M. Conforti, and G. Rinaldi present another 
branch-and-cut algorithm. They start with a linear program defining the convex hull of all 
solutions and use several separation procedures to add cuts to the linear program. They 
address the bisection problem with edge weights. In |Q E. Johnson, A. Mehrotra, and G. 
Nemhauser present a column generation algorithm for the Graph Partitioning problem. 
There, the generation of additional columns itself is NP-hard, so they present efficient 
strategies for the generation. They address the general Graph Partitioning problem with 
vertex and edge weights, an arbitrary number of partitions and a maximal partition size. 

Also related to our paper is the work of F. Shahrokhi and L. Szekely in IT5I . They 
examine bounds basing on the embedding of the clique iC„. They apply this bound 
to general graphs with a small number of equivalence classes of vertices and use only 
shortest paths for the flow. Putting this together they get lower bounds on the crossing 
number, bisection width and edge and vertex expansion of graphs. Their bounds are 
especially good if there is only one equivalence class of vertices. 

In the next section we give basic definitions for the Graph Partitioning problem and 
Multicommodity Flows. Then in section three we present the new lower bounds for the 
Graph Partitioning problem basing on more flexible multicommodity flows. In section 
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four the branch & bound algorithm using this bound is described; inside the algorithm 
we can use the solution of the multicommodity flow instance to force vertices to stay 
in the same partition. Finally, in section five we give upper bounds on the new lower 
bounds and apply the new lower bounds to the butterfly graph. 



2 Definitions 

There have been a couple of slightly different definitions of the Graph Partitioning 
problem. Inside this paper we are talking about a graph with vertex and edge weights, a 
given number of partitions and a maximal size for each partition. More formally: 

Definition 1. The Graph Partitioning problem has given an undirected graph G = 
(y, E), vertex weights g : V ^ IN, edge weights f : E ^ ]N, a number of partitions 
p S ]N, and a maximal size M G IN. The problem is to find a partition of the vertices 
V into p disjoint sets V\, . . . ,Vp with Vf : ^ minimal CutSize := 

min^^y y CutSize{Vi , . . . , Vp), where 

CutSize{Vi, . . . , Vp) := y: fiM)^ 

v^Vi,w^Vjy<.j 



In the following we use n = \V\ and N = above definition is a very 

general one. If p = 2 is used we obtain the well known bisection problem. Often equally 
sized partitions are requested, i.e. M = ■ Notice, that in the case of p > 2 this is a 

less strong requirement than the requirement Vf, j : |Vi| — |1^| < 1 which is also used 
sometimes. 

As mentioned above we want to use Multicommodity Flows in order to compute 
lower bounds on the Graph Partitioning problem. In the Multicommodity Flow problem 
there are commodities of specific sizes wifh a source verfex and a destinafion vertex. The 
goal is to fulfill the given set of commodities with minimal congestion. More formally: 

Definition 2. The Multicommodity Flow problem has given an undirected graph G = 
{V, E), edge weights f : E ^ M, and for each pair (v, w) G V'^ the size G Si>o 
with Vv G V : dy^y = 0 of the commodity which has to flow from vertex v to vertex w. 
A flow h : V X V X V ^ dR>o sought with h(u, v,w) = 0 <= {n, in} ^ E, and 

\/v, w G V,v f w : h{v, u, w) — h{v, w, u) = dy^y,, 

uev 



The congestion c: E ^ IR of each edge is c{{v, m}) := 

h{u, w, v). A flow h is searched with minimal total congestion G := maxeg e c(e). 

The Multicommodity Flow problem is a well known problem. It can be represented 
as a linear program of polynomial size, so it is solvable in polynomial time. Fast or 
approximating algorithms for the Multicommodity Flow problem are subject of current 
research activities, see e.g. mrm . 



394 



N. Sensen 



3 Multicommodity Bounds 



General Idea. The main idea for the generalization of the known lower hound on the 

graph bisection problem into equally sized partitions is the following: The known bound 

bases on the embedding of a into the given graph with minimal congestion C . Any 

2 

bisection of a AT„ has a CutSize of at least Since each edge of the given graph 
is used by at most C edges of the AT„, a CutSize of at least ^^7 of the given graph 
is unavoidable in order to cut the minimal number of edges of the Computing the 
minimal congestion of this embedding is equivalent to calculating the minimal conges- 
tion C of a Multicommodity Flow problem with Vv, w € V : = 1. In this case 

we can argue equivalently to get a lower bound for the Graph Partitioning problem: Any 

2 

bisection of the graph has a flow of ^ between vertices of different partitions. Since 

each vertex transports at most C commodities (assuming Ve € E : /(e) = 1) at least ^ 

2 

crossing edges are necessary, i.e. CutSize > This consideration can be generalized 
to any Multicommodity Flow instance: 



Theorem 1 

Given a Graph Partitioning problem with a graph G which has to be divided into 
p partitions. If a Multicommodity Flow with this graph G, sizes d, and congestion 
C exists with a Cut Flow € IR such that for any optimal partition V = 
J2veVi,weVj.i7^j - CutFlow holds. Then 



CutSize > 



CutFlow 

C 



Proof. Let V = [J 14 be an optimal solution of the Graph Partitioning problem. Then 

CutFlow < ^ ^ dv,w ^ ^ ^ C ■ /({w, w}) = C- CutSize{Vi, ...,Vp) 

v^Vi,wGVj ,i^j vGVi,wGVj ,i<j 

Since we have assumed that V = IJ optimal, CutSize > follows. □ 



Different Instantiations. So in principle we could use any Multicommodity Flow 
instance for the computation of a lower bound on the Graph Partitioning problem. The 
bound is the better the bigger the guaranteed CutFlow and the smaller the congestion C 
are. In the following we introduce three different general Multicommodity Flow instances 
with a different degree of freedom for their choice of the sizes of the commodities. The 
first one corresponds to the known bound where every vertex sends a commodity of size 
one to every other vertex. We have adapted it to consider vertex weights, only: 

Definitions. The \-\-MC is a Multicommodity Flow instance with a graph G = (V,E), 
vertex weights g : V ^ IN, and default sizes of the commodities: 

\/v,w€V: dy^y, = g{v) ■ g{w) 

The next Multicommodity Flow instance allows a variable source strength for each 
vertex. The idea behind this is that more central vertices generate less total load on the 
edges such that we hope to get a larger CutFlow with a smaller congestion: 
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Definition 4. The VarMC is a Multicommodity Flow instance with a graph G = {V, E), 
vertex weights g : V ^ IN, and a free source strength s : V ^ -K>o such that 

yv,w G V : dv,w = g{w) ■ s{v). 

For the computation of a good lower bound on the graph partitioning problem the free 
source strengths have to be adapted to the given graph such that the bound is maximized. 
The selecting of the source strengths is not done by hand but can be included into a linear 
program which also solves the Multicommodity Flow problem. Finally, we introduce a 
Multicommodity Flow instance in which all sizes of the commodities are free: 

Definition 5. The MVarMC is a Multicommodity Flow instance with a graph G = 
{V, E), vertex weights g : V ^ IN, and a free source strength s : V xV ^ dR>o with 
s(u, u) = 0 Vv G F such that 

\/v,w G V : = g{w) ■ s{v,w). 

The critical point for the use of a Multicommodity Flow instance when computing a lower 
bound on the graph partitioning problem is the guaranteed GutElow. The following 
Theorem delivers correct values for the MVarMC instance: 

Theorem 2 

We have given a Graph Partitioning problem with a graph G = (V, E), vertex weights 
g, and the constants p and M. A Multicommodity Flow instance MVarMC guarantees a 
flow over each feasible partition of 

GutElow > EE s{v, w) ■ g{w) — Sy{M — g{v)) j + sR{M — R) 

v^V \w£V / 



with Sy := maxyj^ys(v, w), s := miny^v^^, ond R := N — M ■ 

Proof We are looking for a guaranteed GutElow of the commodities with sizes s(u, w) . 
A GutElow is guaranteed if for any possible partition according to the parameters M 
and p the actual flow between the partitions is at least as large as GutElow. So let us 
assume any feasible partition V = Vi U . . . U with Ni = 5(^)- Then 

p 

CutFlow = EE E s{v^ w) • g{vo) 

i=l vGVi wGV\Vi 

p / 

^EE E s(v, w) ■ g{w) — Sy E g{w) 

i — 1 \ luGV' 

= ^ ( E w)fl(w) - (M - g{v)) • j + ^(M - Ai) ^ 

v^V / i=l v^Vi 

> ^ ( E s(v,w)g{w) - (M - g{v)) ■ Sy 

vGV \u;GV 



+ S 



j2(M-Ni)Ni 
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To prove the Theorem it remains to show that ~ ^i)^i ^ R{M — R) for 

every feasible partition, i.e. \/i \ Ni < M A X]?=i 

P P N 

^(M - Ni)Ni = MN - ^ > MN - - R^ = R{M - R) 

i=l i=l 

□ 

Notice that the term R{M — R) corresponds to the problem definition of a maximal size 
for every partition. If the restriction Vi, j : jl^il — |V}| < 1 is used, a different (bigger) 
term is possible. 

Basing on the Cut Flow of the MVarMC instance it is easy to specify the guaranteed 
CutFlow of the VarMC and 1-1-MC instance: 

Lemma 1. The VarMC instance guarantees a CutFlow of 

CutFlow > {N — M) ^ s(ti) + sR{M — R) 
vev 



with s := min^igy The 1-1-MC instance guarantees a CutFlow of 
CutFlow > N{N - M) + R{M - R). 

Proof Using Vn,ui G V : s(v,w) = s(v) orVv,w € V : s{v,w) = g(?;), respec- 
tively, the Lemma follows from the CutFlow of the MVarMC instance. □ 

So altogether we have introduced three different instances with a different degree of 
freedom. It is clear that the bound of the MVarMC instance is at least as good as the 
bound of the VarMC instance which is at least as good as the bound of the 1-1-MC 
instance. On the other hand the computation of the MVarMC instance should last longer 
since more variables have to be specified. For practical usage of the bounds we have to 
compare the quality of the bounds and their running times. 

Experimental Results. All three proposed Multicommodity Flow instances can be 
represented as a linear program. The free source strengths in the VarMC and MVarMC 
instances are realized by additional variables inside the linear program. A maximal 
congestion of one is given and the goal is the maximization of the guaranteed CutFlow. 
Thus, solving the linear program contains the selection of the optimal source strengths. 
The following linear program corresponds to the MVarMC instance: 

Maximize Y.v^v (Ewev '^) ‘ 9{w) - Sy{M - g{v))) -f sR{M - R) 
subject to V{n, w} £ E : J2uev '^) + '^) — /({^> ^}) 

Vn, w £ V,v w : ~ '^) ’ v(^) 

Vu, w £ V, V w : Sv > s{v, w) 

\/v £ V : g{v) ■ s < Sy 

Of course the constraints for s can be omitted if i? = 0. 

To solve this linear program we use the barrier algorithm of the CPLEX package 
( 01 ). All experiments have been performed on a Sun Enterprise 450 Model 4400 machine 
with 1 GB main memory and a Sun UltraSPARC-lI 400 MHz processor. 
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We have done a large amount of tests for different Graph Partitioning problems. Here 
we show the results of a small representative set of graphs in order to show the typical 
behavior of the bounds. The “SEd” graph is a shuffle exchange graph of dimension 
d, the “DBd” graph is a DeBmijn graph of dimension d, “Star 50” is a simple star 
graph with 50 vertices, “Grid 5x10” is a two-dimensional grid. “ex36a” and “ex36d” 
are introduces in both having 36 vertices. In the “a” version each edge is in the 
graph with probability 0.5, in the “d” version each edge has a random weight between 
0 and 10. “Rand 0.1” and “Rand 0.05” are random graphs from ourselves, both having 
60 vertices; each possible edge is contained in the graph with probability of 0.1 or 
0.05, respectively. “RandPlan” is a random maximal planar graph with 100 vertices, 
constructed from routines of the LEDA library Oil 21 V “m4”, “me”, and “m8” are real 
world instances from a finite elements method, see OH, also used in | |.bl1 lfi| |. “cb30” and 
“cb61” are real world instances from compiler design problems, also used in 1^ . and 
Anally the “weba::” graphs are the “chx” graphs with vertex weights. 

The problems in Tabled with p = 2 and M = corresponds to the classical 
bisection problem with equally sized partitions. For each of the three Multicommodity 
Flow instances the lower bound and the computation time (format hh:mm:ss) is given. 
The “Opt” column gives the exact result, or in braces the best known upper bound if the 
exact solution is unknown. The graph with missing 1-1-MC and VarMC results are not 
connected, so it is not possible for any vertex to send something to all other vertices. 
Bold faced entries in the 1-1 -MC column show instances where the bound is significantly 
worse than the VarMC bounds. And bold faced entries in the M VarMC column show 
instances where the bounds are significantly better than the VarMC bounds. Furthermore, 
results are presented with p = 2, M = \^N~\ and p = 4,, M = [jiV] . 



Conclusions from the Experiments 

- The 1-1 -MC instance is generally faster than the two other instances . The VarMC and 
MVarMC instances are sometimes equally fast and sometimes the VarMC instance 
is the faster one. 

- The 1-1-MC bounds are generally worse than the bounds from the two other in- 
stances. With increasing M the gap of the bounds of the 1-1-MC and the VarMC 
gets smaller. 

- Forp = 2 and M = n/2 VarMC and MVarMC often deliver the same bound while 
with increasing p or M the gap between the VarMC and MVarMC bounds gets 
bigger. 



4 Branch & Bound Algorithm 

Branching Realization. In a branch & bound algorithm a given subproblem, which 
cannot be bounded, has to be divided into at least two new restricted subproblems. We 
do this restriction by determining if two vertices v,w G V stay in the same partition or 
are separated into different partitions. We call these two possibilities “join” or “split”. 
A join is performed by creating a new graph from the original graph: the two vertices 
v,w G V, which have to stay in the same partition, are joined into one vertex v with 
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Table 1 . Lower bounds and computation time with the three Multicommodity Flow instances 





Graph 


1-1-MC 


VarMC 


MVarMC 


Opt 






Bound Time 


Bound Time 


Bound Time 






SE7 


14.26 


21 


15.12 


49 


15.12 


1:30 


16 




DB7 


27.52 


41 


28.98 


1:39 


28.98 


1:45 


30 




Star 50 


12.76 


0 


25.00 


0 


25.00 


1 


25 




Grid 5x10 


5.00 


1 


5.00 


2 


5.00 


3 


5 




ex36a 


92.58 


8 


104.17 


10 


104.17 


19 


117 




ex36d 


1277.49 


41 


1385.59 


43 


1385.59 


37 


1426 


Rand 0.1 


15.25 


5 


30.00 


12 


37.83 


19 


42 


M = rfi 


Rand 0.05 


- 


- 


- 


- 


9.79 


13 


10 


RandPlan 


21.37 


15 


26.79 


22 


26.79 


50 


28 




m4 


- 


- 


- 


- 


5.83 


1 


6 




me 


3.69 


3 


5.10 


8 


5.67 


9 


6 




m8 


7.00 


28 


7.00 


24 


7.00 


45 


7 




cb30 


52.23 


0 


97.50 


0 


213.00 


0 


213 




cb61 


170.50 


5 


330.00 


13 


1906.40 


10 


2177 




SE6 


7.39 


2 


7.42 


7 


7.44 


4 


8 




DB6 


13.98 


4 


13.99 


16 


13.99 


8 


14 




Star 50 


11.10 


0 


16.00 


0 


16.00 


1 


16 




Grid 5x10 


4.35 


1 


4.35 


2 


4.35 


3 


5 




ex36a 


82.29 


8 


86.18 


10 


88.48 


12 


101 




ex36d 


1135.54 


39 


1181.32 


48 


1212.00 


42 


1246 


Rand 0.1 


13.56 


5 


20.00 


17 


27.30 


17 


33 


M = [fiV] 


Rand 0.05 


- 


- 


- 


- 


6.00 


7 


6 


RandPlan 


18.90 


14 


20.59 


31 


20.64 


26 


21 




m4 


- 


- 


- 


- 


4.69 


0 


6 




me 


3.23 


3 


3.38 


12 


3.45 


6 


4 




m8 


6.20 


27 


6.20 


41 


6.20 


41 


7 




cb30 


46.43 


0 


65.00 


0 


114.00 


0 


114 




cb61 


150.33 


6 


220.00 


14 


717.00 


8 


717 




wcb30 


38.85 


0 


53.54 


0 


126.10 


1 


136 




wcb61 


114.31 


8 


165.92 


16 


826.08 


18 


867 




SE6 


12.58 


2 


13.37 


7 


15.15 


7 


18 




DB6 


23.79 


4 


25.44 


15 


27.17 


10 


32 




Star 50 


19.10 


0 


37.00 


0 


37.00 


1 


37 




Grid 5x10 


7.49 


1 


7.49 


2 


13.95 


8 


16 




ex36a 


138.86 


7 


156.25 


9 


159.00 


13 


(186) 


p = 4 


ex36d 


1916.23 


38 


2078.38 


42 


2098.00 


24 


(2192) 


Rand 0.1 


22.88 


5 


45.00 


11 


58.51 


22 


(69) 




Rand 0.05 


- 


- 


- 


- 


16.56 


13 


18 


RandPlan 


32.05 


15 


40.20 


26 


53.19 


1:09 


60 




m4 


- 


- 


- 


- 


11.45 


1 


12 




me 


5.53 


3 


7.59 


9 


13.25 


17 


14 




m8 


10.50 


27 


10.50 


23 


20.70 


2:36 


22 




cb30 


78.00 


0 


143.00 


0 


430.20 


0 


436 




cb61 


255.20 


5 


495.00 


14 


4487.88 


13 


4565 
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ff(^) = 5(v) +5(w)- The new edge weights are generated accordingly: Vu G y\{f, w} : 
f({u,v}) = f({u,v}) + f({u,w}). It is obvious that this new graph has the same 
CutSize as the original graph with the restriction that v and w have to be in the same 
partition. 

A split of two vertices v,w G F is performed by removing the commodity with 
source v and destination w from the general computation of the Cut Flow. Instead, 
this particular flow can be counted into the CutFlow completely. In fact in case of the 
MVarMC instance the split means only a reinterpretation of Sv, while in case of the 
VarMC and 1-1-MC instance an additional variable s{v, w) is included into the linear 
program. Altogether we manage a set S' C V'^ of split pairs in the branch & bound 
algorithm 



Branching Strategy. A crucial point for a good branch & bound algorithm is the 
branching strategy: In our application this means the decision, which pair of vertices 
should be used for the join and split in a given situation. Our branching strategy bases 
on a simple upper bound of the lower bound of the 1-1-MC instance: By assuming that 
all flows use only shortest paths and all edges have the same congestion we get a simple 
upper bound on the lower bound of the 1-1-MC instance. For the branching selection 
we use a pair of vertices such that the above upper bound is maximized in case of a join 
of this two vertices. Furthermore, in order to take the split case into consideration, we 
prefer pairs of vertices with a small distance. This two criteria have to be combined and 
experiments have shown that a weight of 20 percentage for the upper bound on the lower 
bound and 80 percentage on the shortest distance gives a quit good branching strategy. 



Forcing Moves. Following the subproblems in the search tree, there are some situations 
where pairs of vertices can be forced to be split or joined. Firstly, if3v GV : g{y) = M 
this vertex v can be split from all other vertices. Secondly, if 3w, w G V, {w, w} ^ S : 
N + g{v) — g{w) — < ^ ~ {p— 1)TT then we can join the vertices 

V and w, since otherwise vertex v has no possibility to become part of a correct partition. 
Thirdly, we look at the graph Gs ■= (V, S'): If Gg has two clique subgraphs with p 
vertices which match in exactly p — 1 vertices, then the two remaining vertices can 
be joined. Finally, the last possibility for forcing a join bases on a given solution of a 
Multicommodity Flow instance: 

Lemma 2. Given a Graph Partitioning problem with graph G and edge weights f and 
a Multicommodity Flow problem with graph G, congestions c : E ^ IR, maximal con- 
gestion C and CutFlow CF. An improving solution for the Graph Partitioning problem 
with CutSize at most L is searched. Then the vertices v, w with {u, w} = e G E can be 
joined if 

CE+{C-c{e))-f{e) ^ ^ 



This possibility for forcing a join is extremely helpful if the bounds are close to a given 
feasible solution and the graph is somehow irregular such that there are edges which are 
not loaded with the maximal congestion. 
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Experiments. As there are a lot of good heuristics for the Graph Partitioning problem, it 
is easy to get a good feasible solution at the beginning of the branch & bound procedure. 
We use the Party library lEI for this purpose. As in most cases the solution from the 
heuristic is optimal, we do not matter about any best first search but use simple depth first 
search. Again, CPLEX is used for the computation of the bounds. The barrier algorithm 
stops if the primary solution is good enough to bound the actual subproblem or if the 
dual solution shows that we cannot bound the actual subproblem. 

In Table □ a comparison of our branch & bound algorithm with the results from 
Karisch et al. 0, for our knowledge the best actual code, are presented. The results 
in IS] are performed on a HP 9000/735 system. The “#B” column gives the number of 
search nodes in the branch&bound tree, the time is given in hh:mm:ss. For the prob- 
lems with the DeBruijn graphs we have utilized two specific properties: Firstly, every 
bisection of the DeBruijn graph has an even CutSize; Karisch has used this fact, also. 
Secondly, symmetrical parts of the search tree, which result from four automorphisms 
of the DeBruijn graph, are cut of; this decreases the search tree of the DBS by a factor of 
about two. Finally, we want to remark that the bisection problem of the DeBruijn graph 
of dimension 8 is solved by the presented approach for the first time at all. 



Table 2. Results of the branch & bound algorithms with the Graph Bisection problem 



Graph 


1-1-MC 


VarMC 


MVarMC 


Karisch 




Time 


#B 


Time 


#B 


Time 


#B 


Time 


#B 


m4 


- 


- 


- 


- 


1 


1 


1 


1 


me 


2:41 


56 


7 


1 


10 


1 


46 


1 


ml 


- 


- 


- 


- 


21 


1 


18:15 


15 


m8 


27 


1 


15 


1 


29 


1 


5:21 


1 


cb30 


3 


23 


3 


21 


0 


1 


2 


1 


cb47_99 


1:49 


238 


2:34 237 


10 


7 


10 


1 


cb47_101 


22 


53 


22 


40 


7 


3 


1:52 


35 


cb61 


2:19 


78 


3:38 


72 


42 


6 


20 


1 


DB7 


28:10 


44 


25 


1 


1:12 


1 


8:53:20 


195 


DBS 


>50:00:00 >1000 


6:54:35 


33 


11:23:15 


33 






ex36a 


3:33:07 


10058 


22:19 197 


41:04 197 


3 


1 


ex36d 


39:53:38 


67571 


45:44 101 


40:30 101 


11 


3 



Conclusions from the Experiments 

- The MVarMC instance is the best one for the branch & bound algorithm from the 
three Multicommodity Flow instances. 

- Concerning the tested real world applications our approach delivers equally results 
compared with the results from Karisch. For the DeBruijn graphs our approach is 
in orders of magnitude better than Karischs approach. This is the other way around 
for the randomly generated dense “ex” problems. 



Lower Bounds and Exact Algorithms for the Graph Partitioning Problem 401 



So we conclude that our approach is quite good for more “sparse” and “regular” graphs 
while it is less good for “dense” graphs. In most applications of the Graph Partitioning 
problem, e.g. circuit layout or load balancing, the graphs are relatively “sparse” and 
“regular”, in fact. So our approach is well suited for these applications. 

5 Theoretical Analyses 

Here we give some theoretical observations on the presented new bounds. Due to the 
given space constraints we only show some ideas. 

First, we look at upper bounds on the presented lower bounds. Assuming a graph 
G = (V, E) has an infeasible cut V = Vi LI V 2 . Then we can conclude that the 1-1- 
MC bound has an upper bound of j^j;^CutSize{Vi, V 2 ) since a specific number of 
commodities has to cross the given cut. From the existence of a given infeasible cut we can 
also conclude an upper bound on the VarMC bound of 2 .min{iVi jv^} V 2 ). 

In contrast to this we cannot conclude any upper bound for the MVarMC bound since 
no flow at all is forced to cross the specific cut. So you can see that the three different 
Multicommodity Flow instances can react differently onto the given restriction and it 
gets clear that the more flexible the instance is, the potentially better is the bound. 

Second, we show that the new bounds can also be used to improve lower bounds on 
the Graph Partitioning problem with classes of graphs, e.g. the bisection problem with 
the butterfly graph. The butterfly graph (without wraparound edges) of dimension d has 
a simple cut with CutSize = 2'^. In Ql it has been shown that the bisection width is 
2(x/2 — 1)2'^ -I- 0 ( 2 ^^) Ri 0.83 • 2 '^. The known 1-1-MC bound delivers a lower bound 
of (i -I- o(l))2‘^. Now using the possibility of the VarMC instance that not all vertices 
have to send something, we can improve this lower bound. I.e. only the 2'^ vertices of 
the first level of vertices and the 2'^ vertices of the last level send commodities of size 
one to all other vertices. This schema gives a lower bound of ( | -f o(l))2‘^. 

The butterfly graph is an example of a quite regular graph where the VarMC instance 
gives a better lower bound on the bisection problem with equally sized partitions than the 
known 1-1-MC. Furthermore, it is remarkable that the above analysis can be adapted to 
the Benes graph, a kind of back-to-back butterfly. The Benes graph has a similar simple 
bisection with CutSize = 2'^+^. To our knowledge the asymptotically exact bisection 
width is unknown. Using the 1-1-MC we get a lower bound of while the VarMC 
instance where only the two most outside levels of vertices and the most inside level of 
vertices send commodities delivers a lower bound of 

6 Conclusion and Further Work 

We have introduces a generalized lower bound on the Graph Partitioning problem. The 
bound bases on Multicommodity Flow instances with arbitrary sizes of commodities for 
every pair of source and destination. To get correct lower bounds from a flow instance 
the guaranteed CutFlow is used. By inserting the sizes of the commodities as variables 
into the linear program, we get the best selection of these sizes for the given graph auto- 
matically. We have compared three different types of Multicommodity Flow instances 
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with a different degree of freedom for the sizes of the commodities. Experiments show 
the superiority of the instance with the biggest degree of freedom. 

Basing on these bounds a branch & bound algorithm has been presented which 
computes exact solutions for the Graph Partitioning problem. The comparison with 
other approaches shows that for a lot of graphs the presented approach delivers very 
good results. For example the DeBruijn graph of dimension eight has been solved exactly 
for the first time. On the other hand there are graphs, for example the quite dense and 
random instances introduced by Karisch, where former approaches are better suited for. 
Finally, it has been shown that the generalized bounds can also be used for theoretical 
analyses of graphs and can deliver new lower bounds. So altogether the new branch 
& bound algorithm is of importance for applications, since the algorithm can solve 
problems, which are unsolved until now. And furthermore, the generalized bounds offer 
new instruments for theoretical analyses. 

For a further speed-up of the branch & bound algorithm three improvements look 
promising: Firstly, we could use the resulting primal and dual solution of a node of the 
search tree as starting point for the interior point algorithm of the next nodes in the 
search tree. Secondly, the branch & bound algorithm could be parallelized. Thirdly, we 
could use specialized algorithms for the Multicommodity Flow problem instead of using 
the general tool of an interior point algorithm. But our usage of Multicommodity Flow 
problems does not correspond to known ones since the sizes of the commodities have to 
be selected. So existing algorithms must be adapted. Apart of this it is also interesting 
to use the VarMC and MVarMC instance for theoretical analyses of lower bounds on 
graphs. For the graph bisection problem the VarMC instance is promising using with 
quite “regular” graph which are not vertex symmetric. For a partition into more than two 
partitions the MVarMC approach is promising, even for vertex symmetric graphs. 
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Abstract. This paper develops three polynomial-time techniques for 
pricing European Asian options with provably small errors, where the 
stock prices follow binomial trees or trees of higher-degree. The first 
technique is the first known Monte Carlo algorithm with analytical error 
bounds suitable for pricing single-stock options with meaningful confi- 
dence and speed. The second technique is a general recursive bucketing- 
based scheme that enables robust trade-offs between accuracy and run- 
time. The third technique combines the Fast Fourier Transform with 
bucketing-based schemes for pricing basket options. This technique is 
extremely fast, polynomial in the number of days and stocks, and does 
not add any errors to those already incurred in the companion bucketing 
scheme. 



1 Introduction 

A call (respectively, put) option is a contract assigning its holder the right, 
but not the obligation, to buy (respectively, sell) a security at some future time 
for a specified strike price X EOI- If the holder exercises her right, the other 
party in the contract, the writer, is obligated to assume the opposite side of the 
transaction. In exchange for this right, the holder pays the writer an option price 
P. The security in this contract can be any financial asset; for the purpose of 
this paper, we restrict it to a single stock or a portfolio of stocks. An option in 
the latter case is commonly called a basket option; for clarity, we call an option 
in the former case a single-stock option. 

Options are popular financial instruments for a variety of trading strategies. 
For example, options can be used to hedge risk. As protection from a potential 
price fall in a stock price, one can purchase a put on the stock, thereby locking in 
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a minimum sell price. On the other hand, options can provide additional income 
for stockholders who write calls on their holdings; of course, this strategy carries 
the risk of being forced to sell the stock should the calls be exercised. 

An option is valid until its expiry date. For a European option, the holder 
may exercise it only on the expiry date. For an American option, the holder may 
exercise it on any date up to and including the expiry date. The payoff of an 
option is the amount of money its holder makes on the contract. A European 
call is worth exercising if and only if S' > X, where S is the stock price on the 
expiry date. The payoff of the call is (S — X)+ = max(S — X, 0). For an American 
call, S is set to the stock price at the exercise time. An Asian option comes in 
European and American flavors, depending on when it may be exercised. For an 
European Asian call, if A is the average stock price over the entire life of the 
contract up to the expiry date, the payoff is {A — X)+. For an American Asian 
call, A is set to the average stock price up to the exercise date. The payoffs of 
puts can be symmetrically defined. 

The fair price P of an option is the discounted expected value of the payoff 
with appropriate martingale measure. Because of the popularity of options, pric- 
ing techniques for computing P have been extensively researched. Please see the 
references in m for details on prior work. Generally, it is more difficult to price 
a basket option than a single-stock option. To compute P, the price movement of 
each individual stock needs to be modeled. Typically, it is modeled as Brownian 
motion with drift. Using a stochastic differential equation, P can then be com- 
puted via a closed-form solution to the equation. When a closed-form solution 
is not known, various approaches are used to find an approximate solution. One 
class of approaches involves approximating the solution using numerical meth- 
ods. Other approaches approximate the Brownian motion model with a discrete 
model, and use this model to approximate P. One such discrete model is the 
binomial tree model, due to Cox, Ross, and Rubinstein P|; see Section 0for the 
definition of the model. 

This paper develops three polynomial-time pricing techniques with provably 
small errors. The remaining discussion makes the following assumptions: (1) The 
option in question is an European Asian single-stock or basket call. (2) Our task 
is to price the call at the start of its contract life. (3) The price of each underlying 
stock follows the binomial tree model. (4) In the case of a basket call, the price 
of each underlying stock moves independently. Our results generalize easily for 
puts, for a later time point than the start of the contract life, and for trees with 
higher degrees than two. The cases of American options and of correlated stocks 
remain open. 

Monte Carlo simulation has been commonly used in the financial community. 
Despite this popularity, most reported results on error bounds are experimental 
or heuristic mm- Our first technique is the first known Monte Carlo algorithm 
that has analytical error bounds suitable for pricing a European Asian call with 
meaningful confidence and speed. As shown by Theorem |21 the number of sim- 
ulations required is polynomial in (PI) the logarithm of the inverse of the error 
probability and (P2) the inverse of the price error relative to the strike price 
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but is exponential in (El) the square root of the number of underlying stocks 
and (E2) the volatility of these stocks over the call’s life. In particular, the al- 
gorithm is reasonably fast and accurate for a single-stock European Asian call 
with reasonable volatility. 

Monte Carlo simulation is a randomized technique, and thus there is al- 
ways a nonzero probability that the price obtained by polynomial-time Monte 
Carlo simulation is not accurate enough. The aggregation algorithm of Aing- 
worth, Motwani, and Oldham (AMO) P is the first polynomial-time algorithm 
for pricing single-stock European Asian calls and other path-dependent options 
with guaranteed worst-case price errors. The AMO algorithm is based on a 
simple yet powerful idea called bucketing |0|. Our second technique is a general 
recursive bucketing-based scheme that can use the AMO algorithm, Monte-Carlo 
simulation, and possibly others as the base-case subroutine. This scheme enables 
robust trade-offs between accuracy and time over subtrees of different sizes. For 
long-term options or high-frequency price averaging, it can price single-stock 
European Asian calls with smaller error bounds in less time than the base-case 
algorithms themselves. In particular, as implied by Theorem 01 given the same 
runtime, this recursive scheme prices more accurately than the AMO algorithm; 
similarly, given the same accuracy, the scheme runs faster than the AMO algo- 
rithm. 

This recursive scheme works for calls written on a single stock. Our third 
technique combines Fast Fourier Transform (EFT) and bucketing-based schemes 
to price basket calls and is applicable to European Asian calls as well as others. 
As shown in Theorem 0 this technique is extremely fast, polynomial in the 
number of days and the number of stocks, and does not add any errors to those 
already incurred in the companion bucketing schemes. 

The remainder of this paper is organized as follows. Section El reviews the 
binomial tree model and basic definitions. Section 0 describes the new Monte 
Carlo algorithm. Section^details the recursive scheme. Section Ogives the FFT- 
based technique for pricing basket calls. 

2 The Binomial Tree Model 

A binomial tree T is a recombinant binary tree. If n is the depth of T, T has 
t + 1 nodes at depth t, for 0 < t < n. For 0 < i < t, let T[t, i] (or simply [t, i] 
if T is obvious from context) be the i-th node at level t of T. For t > 0, T[t, 0] 
and T[t, t] have one parent each, T[t —1,0] and T[t — 1, t — 1] respectively. For 
0 < i < t, T[t, i] has two parents, T[t — 1, i — 1] and T[t — l,i]. The number of 
nodes in T is (’^+iKn-i- 2 ) ^ 

Given a stock in the binomial tree model, the stock price is assumed to follow 
a geometric random walk through T. Time is divided into n equal periods, with 
the root T[0, 0] corresponding to time t = 0, when the option is priced, and 
the leaves T[n, ■ ] corresponding to time t = n, the expiry date of the option. 
Let s{T[t,i]) (or simply s{t,i)) be the stock price at node T[t,i]. At each time 
step, the stock price s{t, i) rises to s(t -I- 1, f -I- 1) = u ■ s{t, i) — an uptick — with 
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probability p or falls to s{t + l,i) = d ■ s{t,i ) — a downtick — with probability 
q = 1 — p. Letting r denote the risk-free interest rate, the parameters u and 
d satisfy 0<d< l + r<it and are typically taken to be u = ^ = 
where cr is the n-period volatility, or standard deviation, of the stock price cni. 
Although the probability p of an uptick is not known in general, for the purposes 
of pricing options, we can use the risk-neutral probability model cni, which states 
that p = where r is the risk-free interest rate for one period. This makes 

the expected return on the stock over one period, pu-\- (1 — p)d, equal to the 
risk-free return, 1 -|- r. 

Let 17 be the sample space of paths lv = (wi, . . . ,o;„) down T, where each 
u>t € { — 1,1}, with —1 corresponding to a downtick and 1 corresponding to 
an uptick. Given w G 17 and 0 < t < n, let T'[t, oj] be the unique node at 
level t that to passes through. Similar to the notation introduced above, we let 
s{T[t,uj]) = s{t,uj) be the price at node T[t,uj]. 

We define the random variables Yi, ■ • ■ , W on 17 by Yt{ui) = ivt, the t-th 
component of u. We define the probability measure 7T on 17 to be the unique 
measure for which the random variables Yi, . . . , Y„ are independent, identically 
distributed (i.i.d.) with P{Yi = 1) = p and P(Y, = —1) = q. We start with an 
initial fixed stock price So = s(T[0,0]). For I < t < n, the stock price St is a 
random variable defined by St = Squ^'=^ . From the structure of the binomial 

tree, we have Pr(S't = s{t,i)) = ~ pY~\ where 0 < i < t. The running 

total of the stock prices is defined by Tt = running average is 

At = Tt/{t+l). For w G 17, we let St{oj) = 
and At{iv) = Tt{uj)/ (t -\- 1). 

Recall that X is the strike price of a European Asian call. Using the above 
notation, the price of this call is E((A„ — W)+) = — {n-\- l)7f)+) = 

^j^E( max(T„ — (n-l- 1) AT, 0)) . For large n, it is not known if this quantity can be 
computed exactly because the stock price can follow exponentially many paths 
down the tree. Below, we show how to estimate E((T„ — {n-\- 1)A1)+) , from which 
the price of the option can be easily computed. 

3 A New Monte Carlo Algorithm 

Monte Carlo simulation methods for asset pricing were introduced to finance by 
Boyle |2j. They are very popular in pricing complex instruments, particularly 
path-dependent European-style options. These methods involve randomly sam- 
pling paths to G f2 according to the distribution II and computing the payoff 
{An{ui) — 7f)+ on each sample. Suppose N samples . . . ,io^ are taken from 
n. The price estimate of the call is p = ^ ~ The accuracy 

of this estimate depends on the number of simulations N and the variance 
of the payoff: the error bound typically guaranteed by Monte Carlo methods is 
0{t/\/N). Generally is not known, and itself must be estimated to determine 
the error bound of p. 

A number of techniques are used to reduce the error of p. For example, j^] 
uses the control variate technique, which ties the price of the option to the price 
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of another instrument (the control variate) for which an analytically tractable so- 
lution is known. The simulation then estimates the difference between the option 
price and the control variate, which can be determined with greater accuracy 
than the option price itself. The antithetic variate method 0 follows not only 
randomly sampled paths to down the binomial tree, but also the “mirror images” 
of each to. None of these techniques have known analytical error bounds for /i. 

Below we use concentration of measure results in conjunction with Monte 
Carlo simulation to estimate the price of a European Asian call and derive 
analytical error bounds for this estimate. The error bounds are in terms of the 
strike price X of the option and the maximum volatility (Tmax of the underlying 
stocks. 



3.1 Analytical Error Bounds for the Single-Stock Case 



Let C = In this section, we show that if (n+ 1)X/C is “small”, E((T„ — 

(n-|-l)A)+) is close to E(r„ — (n-l-l)A) = E(T„) — (n-l-l)A (Theorem^l)). The 
option is deep-in-the-money and will probably be exercised. Since a closed-form 
formula exists for E(T„) P, E(Tjj — {n-\-l)X) can be computed exactly, and our 
algorithm uses it as our estimate for E((T„ — {n-\- 1)A)+). On the other hand, 
if {n -\- 1)X/C is not small, the variance of {T^ — {n -\- 1)A)+ can be bounded 
from above (Theorem P(2)) and our algorithm estimates its expectation with 
bounded error using Monte Carlo simulation. 

We first give some theoretical results, then show how these results can be 
used in our Monte Carlo algorithm, BoundedMC. We begin by finding bounds 
A and B such that T„ G [A, B] with high probability. 

Lemma 1. Let C = For any X> 0, we have Pr (T„ < or Tn > 

< 2e~^ where a is the volatility of the stoek. 

Proof. This follows from Azuma’s Inequality 0. 



Now, fix e > 0 and choose Aq = y21n|. Then, by Lemma P Pr(T„ < 
Ce~'^^ or T„ > < e. Theorem mi) says that if (n -I- 1)X < Ce then 

E((T„ — (n-l- 1)A)+) is close to E(T„ — (n-l- 1)A). Otherwise, TheoremP2) says 
that Var((T„ — {n -\- 1)A)+) is bounded. 



Theorem 1. Let C = e®dnT„)^ I'jX < Ce then E((T„ — {n-\- 

1)A)+) - E(T„ - {n+ 1)A) < e(n 1)X. (2) If (n -k 1)X > then 

Var((T„-(n-kl)A)+) < {n + if X"^ . 

Proof. Statement 1. Let (f>{t) denote the probability density function of T„. Note 
first that E((T„ - (n + 1)A)+) = /“(t - (n + l)X)+(fit)dt = - 

(n -k l)X)(j>{t)dt = E(T„ - (n -k 1)A) - {t - {n + l)X)(j>{t)dt. Then 



|E((T„ - (n + 1)A)+) - E(T„ - (n + 1)A) | = ~{n+ l)X)^{t)dt 



{n + 1)X (fit)dt = (n -k l)APr (T„ < (n + 1)X) < (n -k l)APr (T„ < 



< 
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(^g-o-Ao) < e{n+l)X^ where the second-last inequality follows from the assump- 
tion that {n + 1)X < and the last inequality follows from Lemmadand 

our choice of Aq. 

Statement 2. We arrive at the result by applying several inequalities derived 
from the theory of “integration-by-parts” to Var((T„ — (n -I- 1)AT)+) and using 
the assumption that {n + 1)AT > . A detailed proof is presented in the 

full paper. 

3.2 The BoundedMC Algorithm 

We next use these results in our algorithm. One approach would be to esti- 
mate C = to determine whether we should apply Theorem mi) or in:2). 

Our algorithm takes a more direct approach. We begin by selecting N samples 
uj^, . . . £ f2 and computing T„(w^), . . . For 1 < i < define the 

random variable Zi as Zi = 1 if Tn{u>'^) < {n + 1)A, and Zi = 0 otherwise. Let 

z = EliZ,- 

Theorem 2. Let 0 < S < 1 be given. With N = 0(log | -I- a^- 2 j ) 

the following statements hold. 

1- If ^ ^ 2e, then, with probability 1 — 5, E(T„ — {n+ 1)A) estimates E((T„ — 
(n -|- with error at most 4e(n -|- 1)X . Correspondingly, the priee of 

the call is estimated with error at most 4eX . 

2. If ^ > 2e, then, with probability 1 — 5, ^ {Tnix’^) — {n + es- 
timates E((T„ — (n with standard deviation at most e{n 1)X . 

Correspondingly, the price of the call is estimated with standard deviation at 
most eX . 

Proof. Statement 1 follows from the proof of Theorem mi) and the Chernoff 
Bound. For Statement 2, we can use the Chernoff Bound and our choice of Aq 
to show that (n -I- 1)X > Ce~^^°. We then arrive at our result by applying 
Theorem iH2). A detailed proof is presented in the full paper. 

Algorithm 1 BoundedMC e) 
generate N = 6>(log | -|- \a- 2 l ) 

let Z be the number of paths tJ* such that T„(cu*) < (n -I- 1)X; 
if ZjN < 2e return :^E(T„ - (n -|- 1)A) = :^(E(T„) - (n -|- 1)X); 
else return A - X)+. 

3.3 Pricing Basket Options Using BoundedMC 

The results derived above are applicable for European Asian basket calls as 
well. If there are m stocks in the basket, each of which has volatility at most 
Cmax , we need to take 0(log j -I- sample paths. With 

probability 1 — 5, our algorithm will estimate the European Asian basket call 
with error at most 4s X. 
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4 A Recursive Bucketing-Based Scheme 

The AMO algorithm takes 0{kn^) time to produce a price estimate in the range 
[P- ^,P], where P is the exact price of the call and k is any natural number. 
As in Sectional this algorithm estimates E((T'„ — (n + 1)X )“*■), from which the 
price of the call, E((A„ — X)+), can be easily estimated. Our recursive scheme 
is a generalization of a variant of the AMO algorithm. Below we first describe 
this variant, called Bucketed Tree Traversal (BTT) and then detail our scheme, 
called Recursive Bucketed Tree Traversal (RecBTT). 

Given binomial tree P of depth n, and numbers t, i, m such that 0 < t < n, 
0 < i < t, and to < n — t, let be the subtree of depth to rooted at node 

Given 0 < t < n, let w|t be the prefix of ui up to level t of T. Note 
that Lo\n = Lo. Given ip,LO €. 17, we say that '0 is an extension of uj\m if, for 

0 < t < m, we have = oJt- Given another binomial tree U of depth n we say 
that Ip € Q{U) is isomorphic to u> € 17(T), if, for all 0 < t < n, = ojt- 

Like the AMO algorithm, BTT is based on the following simple observation. 
Suppose that the running total Tm{oj) = Tm{uj\m) of the stock prices on path 
Lo G f2 exceeds the harrier B = (n + l)Ai. Then, for any extension ip of io\m, 
Tn{ip) also exceeds B and the call will be exercised. If we know the call will be 
exercised on all extensions of w|m, it is easy to compute the payoff of the call on 
these extensions. 

As we travel down a path uj, once the running total Tm(pj) exceeds B, we can 
just keep track of the running total on extensions ip of io\m weighted by n{'ip), 
from which the value of the option can be computed. Hence, we only need to 
individually keep track of path prefixes uj\m that have running totals Tm(w|m) 
less than B. 

Unfortunately, there may be exponentially many such u>\m- However, the 
running totals Tm{u>\m) are in the bounded range [0,P). Rather than trying 
to keep track of each running total individually, we instead group the running 
totals terminating at each node into buckets that subdivide this interval. This 
introduces some round-off error. Suppose we use k buckets to divide [0, R) into 
equal-length subintervals and we use the left endpoint of each interval as the 
representative value of the running totals contained in that bucket. At each step 
down the tree, when we put a running total into a bucket, an error of at most 
^ is introduced. Traveling down n levels of the tree, the total Tn{oj) of a path w 
is underestimated by at most ^ and the average A„(w) is underestimated by 
at most 

BTT is detailed in Algorithm El At each node v = T[t, i] of T, create k + 

1 buckets to store partial sums of path prefixes terminating at v. There will 
be k core buckets and one overflow bucket. This overflow bucket is the only 
difference between BTT and the AMO algorithm. For 0 < j < k, core bucket 
bj{v) stores the probability mass bj(y).mass of path prefixes that terminate at 
node v and have running totals in its range range(j) = [j^, (j -|- 1) ^). The 
representative value of partial sums in the bucket is denoted by .value = 
j^. The overflow bucket bk{v) stores the probability- weighted running total 
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estimates of path prefixes that have estimated running totals exceeding B. This 
quantity is denoted by .value. The probability mass of these path prefixes 
is denoted by bk{v).ma,ss. BTT iterates through each of the k + 1 buckets of 
each of the total runtime of 0{km?). 

Algorithm 2 BTT (T, k, B) 

for each node v gT and each bucket bj{v) set 6j(w).mass <— 0; 

take j such that initial price s(T[0,0]) € range(j); foj (T[0, 0]).mass 1; 

for t = 0, . . . , (n — 1) ’/. iterate through each level 

for i = 0, . . . , t "/, iterate through each node at level t 
letv = T\t,i]; "/o shorthand notation for node T[t,i] 
for w £ {T\t -\- l,i\^T\t -\- l,i + 1\\ y, for each child of node v 
let p' G {p, g} be the probability of going from node v to w; 
for bj(v) € {&o(u), . . . ,bk{v)} ’/, for each bucket at node v 

let V •<— value + s{w); let M •«— 6j(i;).mass x p' ; 
add mass M to bucket corresponding to value V; 
return 6/;(n, i).mass x (6fc(n, i). value — B). 

’/. return option price estimated from overflow buckets at leave^ 

We propose RecBTT, a recursive extension of BTT. Consider some level t in 
our binomial tree B and assume that the weights of all path prefixes terminating 
at level t have been put into the appropriate buckets. BTT uses these weights 
to compute the bucket weights of nodes at level t + 1. In contrast, RecBTT 
recursively solves the problem for subtrees Tm^\ 0 < i < t, of some depth 
m < n — t rooted at node T[t, z] 0 As each recursive call is complete, RecBTT 
Merges the bucket weights at the leaves of into the corresponding nodes 
at level t + m of B- The advantages of the recursive calls are twofold. (1) They 
use finer bucket granularity, resulting in improved accuracy. (2) The results of 
a single recursive call on a particular subtree Bm^^ are used to Estimate the 
results of other recursive calls to other subtrees Bm^\ where j > i, as long as 
the node prices in Bm'^^ are “sufficiently close” to the corresponding node prices 
in Bm’'^ ■ This improves the runtime, since we do not need to make all f + 1 of 
the recursive calls, so there are portions of T that we do not directly traverse. 



4.1 The Merge Procedure 

Consider a recursive call on the subtree Bi = Bn[°’''°^ of depth ni rooted at node 
Vq = T[to)*o]- A leaf Vi = 7i[ni,zi] of Ti (0 < Zi < ni) corresponds to the 
node V 2 = B[to + ni, zo + *i] of T. Merge incorporates the bucket weights at vi 
into the bucket weights at V 2 - Recall that the recursive call on Bi is made with 
finer bucket granularity. Assume we use ki = hiko core buckets instead of just 

^ This is the expected payoff at maturity. For the price at the start of the contract, we 
must discount the payoff according to the interest rate and length of the contract. 

^ Actually, the trees on which we recursively solve the problem are not exactly the 
Bm''’^ ■ Each is identical to the respective Bm ^^ , except the price at the root is changed 
from s{B[t, z]) to 0. The reason for this is explained in Remark Q 
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fco = k. We first combine each group of hi buckets at vi into a single bucket, 
so that we are left with ko buckets to merge into V 2 - When we refer to a bucket 
bj-^(yi) below, 0 < ji < /cq, we mean one of these kg combined buckets. 

The Merge procedure is described in Algorithm O Consider first the core 
buckets. Let 0 < jo,ji,j 2 < kg denote core bucket indices. Bucket bj„{vo) con- 
tains the mass of path prefixes in T terminating at uq whose running total 
estimates fall into the interval range(jo) = [jo^, + Bucket con- 

tains the mass of full paths in 7i terminating at node Vi whose running total 
estimates fall into the interval range(ji). This is equal to the mass of partial 
paths in 7” starting at node vq and terminating at node V 2 whose running total 
estimates fall into the interval range(ji). Merging 7i into T involves merging 
each leaf v± of 7i into the corresponding node V 2 of T. Once the merging pro- 
cedure is done, 6^2 (^ 2 ). mass is updated to contain the weight of path prefixes in 
T passing through node vg and terminating at node V 2 that have running total 
estimates in the interval range (j^)- The overflow buckets are handled similarly. 

Algorithm 3 MERGEfT, 7i = 
let Vo T[to, * 0 ]; 

for ii =0, ... ,ni "/, for each leaf of 7i 
let vi 7i[ni,ii], V 2 ■«- T[to + rii,io -fb]; 
for jo ~ 0, ■ ■ . ,ko y. buckets in vq 
for ji — 0, . . . ,ko °/o buckets in v\ 

let V (uo). value -I- (ui). value; let M <— fejp (uo).mass x (ui).mass; 

add mass M to bucket corresponding to value V . 



Remark 1. Notice that the price at vg is counted twice: once in the path prefix 
from the root T[0, 0] to Uq once in the partial path between uq V 2 - To 
address this issue, when we recursively solve the problem on the subtree 7i, we 
set the price s(7i[0,0]) at the root to be 0, ensuring that this price is counted 
once. This modification does not change our algorithms. 

Lemma 2. For an arbitrary node v, let E(v) he the maximum amount by which 
running totals terminating at node v are underestimated by the bucket values. 
Using the Merge algorithm, we have E{v 2 ) < E{vg) + E{vi) + 

Proof. When the paths of merged trees are concatinated, the errors are summed 
together. An extra error of one bucket size, may be introduced since merged 
buckets cover the range of two buckets, but are put into a single bucket. 

Lemma 3. Merge can be made to run in 0{niklogk) time. 

Proof. In the implementation of Algorithm 0 Merge runs in O(nifc^) time. 
However, the core buckets can be merged with a faster technique. Core bucket 
masses at V 2 are the colvolution of core bucket masses at vg and v\ with an 
appropriate offset. This product can be computed in O(fclogfc) time with the 
Fast Fourier Transform (FFT). 
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4.2 The Estimate Procedure 

Let 7i = and T 2 = be two subtrees of T, where 12 > ii We now 

describe the Estimate ( 72, 71) procedure, which estimates the weights in the 
leaf buckets of T 2 from the weights in the leaf buckets of 71 . This saves us the 
work of recursively solving the problem on 72- 

Estimate is described in Algorithm E] It uses the following fact. Given any 
node vi = Ti[t,i] in 71, let V 2 = 71 [t,*] be the corresponding node in 71 • Notice 
that there is a constant a > 1 such that for all (ui,t> 2 ) pairs, s{v 2 ) = as{vi). 
Hence, for any path ip G f?(71), we have Tm{ip) = aT^^co), where oj G 17(71) is 
isomorphic to ip. 

Algorithm 4 Estimate (T 2 = Ti = 

for i = 0, . . . ,m ’/. go through the leaf buckets of 71 and 71 
let vi 71[m, i], V 2 ■«- Tlfm, *1; 

for ji — 0, . . . ,k "/, go through each bucket at Vi 
let V -4— abji (ui). value; let M 4— (ui).mass; 
add mass M to bucket corresponding to value V . 

Lemma 4. Suppose that a < 2 and assume that the total path sums in 71 are 
underestimated hy our bucketing scheme by at most E. Estimate underestimates 
the total path sums in 71 by at most 2E + 2^. 

Proof. The original error is amplified by a < 2 in the estimation, accounting for 
the first error term. The second error term comes from the fact that the range of 
each bucket in 71 expands to cover at most 3 buckets in 71, all of whose masses 
are put into a single bucket. 

Lemma 5. Suppose that we would like to determine the leaf bucket weights of 
the subtrees where 0 < i < t. We need only call RecBTT at most once 

for every 0{^) subtrees, and use the Estimate procedure to estimate the leaf 
bucket weights of the other subtrees with bounded error. 

Proof. From the structure of the binomial tree, we can Estimate from 
for i as large as while keeping the error bounded. 

4.3 Error and Runtime Analysis 

Using the results of Sections o a,nd I4.2l we can derive recursive expressions for 
the error and runtime of RecBTT. Suppose there are uq trading periods and 
we use a total of ko buckets per node. For i > 0, let n; be the number of trading 
periods and fc; the number of buckets per node we use in the i-th subproblem (at 
the i-th level into the recursion) . Theorem 0 shows us how to choose ri; and fc; 
for an error /runtime tradeoff that is stronger than that of the AMO algorithm. 
Theorem 3. Given integer R > 2, letj— and for i > 0, let n; = (^)^^^“*^ 
and ki — , where a is the volatility of the stock. RecBTT under- 

estimates E((T„ — {n 1)X)+) by at most ) and takes time 

O(2l/7n2fco(^+log^)). 
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4.4 Customization of the RecBTT Scheme 

A key strength of the RecBTT scheme is that the number of recursive calls, 
the size of each recursive call, and the pricing method by which we solve the 
base case of the recursion (the final recursive call) can be custom tailored for the 
application. The base case of the recursion can be solved with any option pricing 
scheme, including BoundedMC, other variants of the Monte Carlo algorithm, 
the AMO algorithm, or exhaustive path traversal. 

In practice, larger values of n appear in several applications. Long-term op- 
tions contracts, LEAPS m, are negotiated for exercise dates several years in 
advance. Companies also offer stock options to employees over periods of up to 
five years or more. Finally, in Istanbul options contracts, rather than computing 
the average based on daily prices, the average is computed over prices taken 
at higher frequencies. In each of these cases, n is sufficiently large that several 
recursive calls are required to reduce the problem to a manageable size. 

5 Pricing Basket Options 

The bucketed tree structure created by BTT and RecBTT can be used to 
price various kinds of European basket options as well. Here, we describe how 
to price European Asian basket options. A basket option uni is composed of m 
stocks, zi,... ,Zm- For each stock Zi, we construct a binomial tree (according 
to the respective stock’s volatility), as described in Section 0 For 1 < t < n 
and 1 < i < m, let SI be the random variable denoting the price of stock Zi 
on day t and define St = be the random variable denoting the total 

of the stock prices on day t. Recall that the payoff of a European basket call 
with strike price X is IE((S'„ — ^)“''). Letting A„ = S"=o be the average 
total stock price, the payoff of a European Asian basket call with strike price 
X is E((A„ — X)+). There is additional complexity with pricing basket options 
that does not appear in their single-stock counterparts: the number of paths 
that the total basket price can follow is exponential, not only in the number 
of trading periods n but also in that of stocks m. Basket options are usually 
priced using traditional Monte Carlo methods. The scheme we describe here is 
the first polynomial time, in both the number of stocks and trading periods, 
pricing scheme for any kind of basket option with provably small error bounds. 

We will call our European Asian basket call pricing algorithm BasketBTT. 
Let B — {n + l)Ai, where X is the strike price of the basket option. For each 
stock Zi, 1 < i < m, use RecBTT to construct the bucketed binomial tree 
structure T* described in Section 0 this time using B as the barrier; should the 
running total of any Zi exceed B, the basket option will always be exercised, 
regardless of what the other stocks do. For each stock Zi, we construct fc -|- 1 
superbuckets Pj, 0 < j < k, where /3* is the combination of buckets bj{v) for 
all leaves r; G TL For the core buckets /?], 0 < j < k, let /3* .value = and 
/3*.mass = X]"=o where this summation ranges over all leaves 

T*[n,£] of TL For the overflow bucket PI, let /3^.mass = 6j(T*[n, ^]). mass 

and Pl-vahie = ELo &fc(T*[n, £]). value x 6j(T*[n,£]).mass. 
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Handling overflow superbuckets: If the running total of a stock Zi reaches 
the overflow superbucket /3|., the option will be exercised regardless of what the 
other stocks do. Given this, we can determine the value of the option exactly, 
since E((T„ — (n+ 1)X)+) = E(T„ — (n+ l)Ai) = E(T„) — (n+ 1)X = .value + 
^ ■ E{Tf ) — (n + l)X, where Tf is the random variable denoting the running 

total of stock Zii up to day n. 

Handling core superbuckets: Consider now the core superbuckets /3], 0 < 
j < k. Let fi{x) = /^]-™ass • be the polynomial representation of the 

core bucket masses of stock Zi and let f{x) = This product can 

be computed efficiently, using the Fast Fourier Transform. Notice that f{x) has 
the form f{x) = box° + bix^ + ••• + From the definition of 

f{x), observe that bj is just the probability that the sum (over all stocks Zi) of 
running totals Tf from the core buckets falls in the range [j^,{j + l)^). Hence, 
the contribution to the option price from the core buckets can be estimated by 
ET=k~'^b,U§-{n+l)X). 

Theorem 4. Given n, m, k, R > 2, ^ = T, CTmin o,n-d Cmax) */ we apply 
RecBTT as described in Theorem 0 to construct the bucketed binomial tree 

for each stock, BasketBTT has an error of 0(m ^ — max_^ runs in 

time 

mk{- + log + mklogmlogk + mklog^ m) . 

*^min 
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Abstract. Competitive auctions encourage consumers to bid their util- 
ity values while achieving revenue close to that of fixed pricing with per- 
fect market analysis. These auctions were introduced in in the context 
of selling an unlimited number of copies of a single item (e.g., rights to 
watch a movie broadcast). In this paper we study the case of multiple 
items (e.g., concurrent broadcast of several movies). We show auctions 
that are competitive for this case. The underlying auction mechanisms 
are more sophisticated than in the single item case, and require solving 
an interesting optimization problem. Our results are based on a sampling 
problem that may have other applications. 



1 Introduction 

Consider an airplane flight where passengers have individual movie screens and 
can choose to view one out of a dozen movies that are broadcast simultaneously. 
The flight is only long enough for one movie to be seen. The airline wants to price 
movies to maximize its revenue. Currently, airlines charge a flat fee for movies. 
Even if the fee is based on a careful marketing study, passenger demographics 
may vary from one flight to another, and individual utilities can vary with flight 
route, time of the year, etc. Therefore a non-adaptive pricing is unlikely to be 
optimal for the seller. We investigate adaptive pricing via auctions. 

We consider the problem of selling several items, with each item available 
in unlimited supply. By unlimited supply we mean that either the seller has at 
least as many items as there are consumers, or that the seller can reproduce 
items on demand at negligible marginal cost. Of particular interest are digital 
and broadcast items. With unlimited supply, consumer utilities, the maximum 
price a consumer is willing to pay for an item, are the sole factor determining sale 
prices and number of items sold. We assume that each consumer has potentially 
different utilities for different items, and needs one item only. The seller’s goal 
is to set prices to maximize total revenue. 

In the scarce supply case, multiple item auctions have been studied by Shap- 
ley and Shubik (See for a survey of the area.) Results for the scarce 
case, however, do not directly apply to the unlimited supply case. Consider the 
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case where each item for sale is unique - for example the real estate market 
considered in m- In this case consumers will bid heavily for highly desirable 
items, which will sell for a high price. In contrast, in the unlimited supply case 
the seller can in principle give every consumer a copy of the item the consumer 
desires most. However, in such an auction, the consumer has no incentive to bid 
high. Thus a good auction mechanism must in some cases limit the number of 
copies of each item. 

A consumer’s utility value for an item is the most they are willing to pay for 
that item. We would like to develop auctions in which rational consumers bid 
their utilities. In game theory, such auctions are called truthful and are special 
cases of strategyproof mechanisms, which have been studied for a long time. For 
example, the Vickrey-Clarke-Groves mechanism mni maximizes the general 
welfare of a system. The Shapley Value m mechanism shares costs among the 
participants. Recent work in the Computer Science community combines eco- 
nomic or game-theoretic questions with computational questions or techniques; 
see e.g., 

Our previous work addressed a special case of the unlimited supply 
auction problem for a single item. In particular, we introduced competitive auc- 
tions which are truthful and at the same time attain revenues close to that of 
fixed pricing with perfect market analysis. As the term suggests, competitive 
analysis of auctions is similar in spirit to the analysis of on-line algorithms; see, 
e.g., |HE|. We introduced several randomized auctions which are competitive 
under certain assumptions and showed some impossibility results, including the 
nonexistence of deterministic competitive auctions. 

In this paper we extend some of these results to multiple item auctions. In 
particular, we develop competitive auctions based on random sampling. These 
auction mechanisms are intuitive but more sophisticated than in the single item 
case. We introduce a multiple item variant of the random sampling auction and 
of the dual price auction and show that these auctions are competitive under 
certain assumptions. We also discuss a deterministic auction. Although this auc- 
tion is not competitive in the worst-case, its single item variant worked well in 
most cases in the experimental study PCI) and show that these auctions are 
competitive under certain assumptions. We also discuss a deterministic auction. 
Although this auction is not competitive in the worst-case, its single item variant 
worked well in most cases in the experimental study m- 

Our work uses the relationship between multiple item auctions and math- 
ematical programming pointed out by Shapley and Shubik. For our random 
sampling auction we need to solve the following subproblem, which is interest- 
ing on its own: given the consumer’s utilities, find item prices that maximize 
seller’s revenue. We state this problem as a nonlinear mathematical program. 

One of our main results is on a sampling problem that may be of indepen- 
dent interest. A variant of the sampling problem is as follows. Suppose we have 
n applicants and m tests. Each applicant takes each test and gets a real-valued 
score. We have to select k applicants based on the results of these scores. Fur- 
thermore suppose that we choose a random subset of the applicants, call the 
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applicants in the subset red, and call the remaining applicants blue. After the 
results of the tests are known and the subset is selected, an adversary selects 
the k winning applicants while obeying the following restriction: If an applicant 
X is accepted and for every test, applicant y get a score that is at least as good 
as the score of x, then y must be accepted as well. Adversary’s goal is to bias 
the admission in favor of red applicants. Although we study a slightly different 
problem, our techniques can be used to show that if fc = o(m^ log n), then with 
high probability the ratio of the number of red applicants to the number of blue 
applicants is bounded by a constant. 

This problem seems natural. One can view candidates as points in m- 
dimensional space, and view the adversary as selecting a shift of the positive 
quadrant so that the shifted quadrant contains k points total and as many red 
points as possible. 

In on-line markets, with rapid changes and the availability of computer trad- 
ing tools and agents, pricing using auctions is sometimes attractive. Competitive 
auctions for multiple unlimited supply items may be useful in some of these sce- 
narios. 

2 Background 

The input to an auction is a number of bidders, n, a number of items, m and a 
set of bids {a^ }. We assume that all bids are nonnegative and that there is no 
collusion among the bidders. We study the case when each bidder wants only a 
single item. 

Given a set of bids, the outcome of an auction is an assignment of a subset 
of (winning) bidders to items. Each bidder i in the subset is assigned a single 
item j and a sales price of at most a^- . An item can be assigned to any number 
of bidders. A deterministic auction mechanism maps auction inputs to auction 
outcomes. A randomized auction mechanism maps inputs to probability distri- 
butions on auction outcomes. We use TZ to denote the auction revenue for a 
particular auction mechanism and set of bids. TZ is the sum of all sale prices. 
For randomized auctions, TZ is & random variable. We will assume that the m-th 
item is a dummy item of no value and that all bidders have utility of zero for this 
item {aim = 0 for all i). Losing is then equivalent to being assigned the dummy 
item at cost zero. 

We say that an auction is single-price if the sale prices for copies of the same 
item are the same, and multiple-price otherwise. 

Next we define truthful auctions, first introduced by Vickrey HH. Let Uij be 
bidder i’s utility value for item j. Define a bidder’s profit to be the difference 
between the bidder’s utility value for the item won and the price the bidder pays 
if they win the auction, or zero if they lose. An auction is truthful if bidding Uij is 
a dominant strategy for bidder i. In other words, the bidder’s profit (or expected 
profit, for randomized auctions), as a function of the bidder’s bids {an , . . . , aim), 
is maximized at the bidder’s utility values {un, . . . , Uim), for any fixed values of 
the other bidders’ bids. Truthfulness is a strong condition for auctions: bidding 



Competitive Auctions for Multiple Digital Goods 419 



utility maximizes the profit of the bidder no matter what the other bidders’ 
strategies are. When considering truthful auctions, we assume that a^- = Uij, 
unless mentioned otherwise. 

To enable analysis of auction revenue we define several parameters of an 
input set of bids. The revenue for optimal fixed pricing is T . Note that T can 
also be interpreted as the revenue due to the optimal nontruthful single-price 
auction. Other parameters that we use in analysis are £, the lowest bid value, 
and h, the highest bid value. Because bids can be arbitrarily scaled, we assume, 
without loss of generality, that £ = 1, in which case h is really the ratio of the 
highest bid to the lowest bid. 

Analogous to on-line algorithm theory, we express auction performance rel- 
ative to that for the optimal nontruthful auction, as ratios TZjT. However, we 
solve a maximization problem, while on-line algorithms solve minimization prob- 
lems. Thus, positive results, which are lower bounds on 'R./T, are expressed using 
“f?”. 

Note that h and T are used only for analysis. Our auctions work without 
knowing their values in advance. 

As shown in jSl) if we do not impose any restrictions on /i, we get the upper 
bound of 'R.jT = 0{l/h). To prevent this upper bound on auction revenue we 
can make the assumption that the optimal revenue J- is significantly larger than 
h, the highest bid. With this assumption, optimal fixed pricing sells many items. 

We say that an auction is competitive under certain assumptions if when the 
assumptions hold, the revenue is 

For convenience, we assume that the input bids are non- degenerate^ i.e., all 
input bids values Uij are distinct or zero. This assumption can be made without 
loss of generality because we can always apply a random perturbation or use 
lexicographic tie-breaking to achieve it. 

As shown for the single-commodity case [B| , no deterministic auction is com- 
petitive in the worst case. Our competitive auctions are randomized. We use the 
following lemma, which is a variation of the Chernoff bound (see e.g. \2W\ I j ). as 
the main tool in our analysis. 

Lemma 1. Consider a set A and its subset B C A. Suppose we pick an integer 
k such that 0 < fc < |A| and a random subset (sample) S C A of size k. Then 
for 0 < S < 1 we have 

Pr[|S'nH| < {l-S)\B\-k/\A\] <exp(-|H|-MV(2|A|)). 

Proof. We refer to elements of A as points. Note that [S' fl i?| is the number 
of sample points in B, and its expected value is \B\ ■ /c/|A|. Let p = fc/|H|. If 
instead of selecting a sample of size exactly k we choose each point to be in the 
sample independently with probability p then the Chernoff bound would yield 
the lemma. 

Let A = {ai,...,a„} and without loss of generality assume that B = 
{ai,...,Ofc}. We can view the process of selecting S as follows. Consider the 
elements of A in the order induced by the indices. For each element consid- 
ered, select the element with probability pi, where pi depends on the selections 
made up to this point. 
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At the point when Oi+i is considered, let t be the number currently selected 
points. Then i — t is the number of points considered but not selected. Suppose 
that tji < p. Then > p. 

We conclude that when we select the sample as a random subset of size fc, 
the probability that the number of sample points in B is less than the expected 
value is smaller than in the case we select each point to be in the sample with 
probability p. ■ 



3 Fixed Price Auction and Optimal Prices 

Consider the following fixed price auction. The bidders supply the bids and the 
seller supplies the sale prices, rj, 1 < j < m. Define — Vj. The auction 

assigns each bidder i to the item j with the maximum c^-, if the maximum is 
nonnegative, and to no item otherwise. In case of a tie, we chose the item with 
the maximum j. If a bidder i is assigned item j, the corresponding sale price is 
C- 

Lemma 2. Suppose the sale prices are set independently of the input bids. Then 
the fixed price auction is truthful. 

Proof. If bidder i gets object j, the bidder’s price is at least rj and the bidder’s 
profit is at most — rj. The best possible profit for i is ma,Xj(uij — rj). If the 
bidder bids Uij = Uij, this is exactly the profit of the bidder. ■ 

Remark Although we assume that the bidders do not see sale prices before 
making their bids, the lemma holds even if the bidders do see the prices. 

Now consider the following optimal pricing problem: Given a set of bids, find 
the set of prices such that the fixed price auction brings the highest revenue. 
Suppose an auction solves this problem and uses the resulting prices. We call this 
auction the optimal nontruthful single-price auction and denote its revenue by T . 
We can interpret T as the revenue of fixed pricing using perfect market analysis 
or as the revenue of the optimal nontruthful single-price auction. The prices 
depend on the input bids, and one can easily show this auction is nontruthful. 

We use T to measure performance of our truthful auctions. Although one 
might think that being a single-price auction is a serious restriction, in the 
single-item auction case this is not so. In this case, the revenue of the optimal 
single-price auction is at least as big as the expected revenue of any reasonably 
(possible multiple-price) truthful auction; see 0. 

Next we state the optimal pricing problem as a mathematical programming 
problem. We start by stating the problem of finding a bidder-optimal object 
assignment given the bids and the sale prices as an integer programming problem. 
This problem is a special case of the b-matching problem ^ (bipartite, weighted, 
and capacitated, with unit node capacities on one side and infinite capacities on 

^ See 0 for the precise definition of reasonable. The intuition is that we preclude 
auctions that are taylored to specific inputs. Such an auction would perform well 
these specific inputs, but poorly on all others. 
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the other). For the limited supply case, when only one copy of an item is available, 
the classical paper takes a similar approach. For our case, this problem is 
easy to solve by taking the maximum as in the previous section. However, we then 
treat sale prices as variables to get a mathematical programming formulation of 
the optimal pricing problem. 

One can show that the optimal price problem is equivalent to the following 
mathematical programming problem; we omit details. 



max 






Tm — 0 



'Yhj 1 

Xij > 0 

Pi + Tj > Uij 
Si Pi ~ Sj Si ^ij ■ i^ij 



subject to 



1 < I < n 

l<i<n, m < j < m 
l<i<n, m < j < m 



( 1 ) 



This problem has quadratic objective function; some constraints are linear while 
other constraints are quadratic. Here Xij is one exactly when bidder i gets item 
j and Pi’s are profits of the corresponding bidders. 

Since Si a^ijO = Sj S* Xijaij ~Y.iPi< J2j Si *he objective func- 
tion is bounded. Since the feasibility region is closed, it follows that m always 
has an optimal solution. 

We omit proofs of the next two results. 

Lemma 3. For any solution of (Q|) with fractional Xij ’s there is a solution with 
Xij € {0, 1} and an objective function value that is at least as good. 

Theorem 1. Consider sale prices defined by an optimal solution of m- The 
revenue of the fixed price auction that uses these prices and has bids Oij in the 
input is equal to the objective function value of the optimal solution. 

Recall that we use the problem m to find a set of prices that maximizes 
the fixed price auction revenue. In the rest of the paper we assume that we can 
compute such prices and leave open the question of how to do this efficiently. 
Note that we could also use an approximate solution. 



4 The Random Sampling Auction 

We use random sampling to make the optimal single-price auction truthful. 

The random sampling auction works as follows. 

1. Pick a random sample S of the set of bidders. Let N be the set of bidders 
not in the sample. 

2. Compute the optimal sale prices for S as outlined in the previous section. 

3. The result of the random sampling auction is then just the result of running 
the fixed-price auction on N using the sale prices computed in the previous 
step. All bidders in S lose the auction. 
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The sample size is a tradeoff between how well the sample represents the input 
and how much potential revenue is wasted because all bidders in the sample lose. 
Unless mentioned otherwise, we assume that the sample size is n/2 or, if n is 
odd, the floor or the ceiling of n/2 with probability 1/2. 

The facts that the bidders who determine the prices lose the auction and 
that the fixed price auction is truthful imply the following result. 

Lemma 4. The random sampling auction is truthful. 

Remark Another natural way of sampling is to sample bids instead of bid- 
ders. However, this does not seem to lead to a truthful auction, because bidder’s 
bids selected in the sample may influence the price used to satisfy the bidder’s 
remaining bids. 

Next we show that, under certain assumptions, the auction’s revenue TZ is 
within a constant factor of T . Without loss of generality, for every 1 < i < 
n,l < j < m, if Oij is undefined (not in the input) we define to be zero. For 
every bidder i, we view {an , . . . , aim) as a point in the m-dimensional space and 
denote this point by vt. Thus Vi is in the quadrant Q of the m-dimensional space 
where all coordinates are nonnegative. We denote the set of all input points by 
B. 

For a fixed m and a set of sale prices ri, . . . , r^, let Rj be a region in the 
m-dimensional space such that if Vi G Rj, then i prefers j to any other item, 
i.e., for any 1 < k < m, Cij > Cik (recall that Cij = — rj). We would like 

{Rj : 1 < j < m} to be a partitioning of Q. We achieve this by assigning 
every boundary point to the highest-index region containing the point. (This is 
consistent with our tie-breaking rule for the fixed price auction.) Rj is a convex 
(and therefore connected) region in Q. In fact, the region Rj is as follows: 



Rj = {x : Xj > rj & Xj — rj > Xfe — Vfc j}. (2) 

Figure ^ shows a two item auction with prices ri and r 2 for items 1 and 2 
respectively. These prices induce the regions R\ = R{ U R'/ and R 2 = R' 2 ^ R'f- 
Arrows point to selling prices for the bidders in each region. 

Thus sampling and computing r^’s partitions Q into the regions, and each 
bidder f in TV gets the item corresponding to the region that i is in. Intuitively, our 
analysis says that if a region has many sample points, it must have a comparable 
number of nonsample points - even though the regions are defined based on the 
sample. The latter fact makes the analysis difficult by introducing conditioning. 
Intuitively, we deal with the conditioning by considering regions defined by the 
input independently of the sample. 

For a given input, let gi, . . . , be a set of optimal prices for the input bids 
that yield revenue T . These prices induce the regions discussed above. Bidders 
in region Rj pay Qj for the item j. If we sample half of the points, the expected 
number of sample points in a region Rj is half of the total number of points 
in the region, and for the prices q\, . . . , qm, the expected revenue is J- /2. The 
optimal fixed pricing on the sample does at least as well. Thus the expected 
revenue of optimal fixed pricing of the sample, E[lFs], is at least T /2. However, 
we need a high-probability result. Our goal is to show that with high probability 
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Fig. 1. Two item auction with regions Ri and R 2 



E[iFs] is close to lF/2 and that E[7^] is close to E[iFg], where TZ is the revenue of 
the random sampling auction. 

We say that a set A C i? is t-feasible if A is nonempty and for some set of sale 
prices, A is exactly the set of points in Rf. For each feasible set A, we define its 
signature Sa = (si, . . . , Sm) such that Sj’s are (not necessarily distinct) elements 
of A and, for a fixed t, different t-feasible sets have different signatures. In the 
following discussion, Sij denotes the j-th coordinate of Si. 

We construct signatures as follows. Let Rt be a region defining A. Rt is 
determined by a set of prices (ri, . . . , r^). We first increase all rj’s by the same 
amount (moving Rt diagonally) until some point in A is on the boundary of Rt- 
Note that since we change all prices by the same amount, the limiting constraint 
from © is Xt > rt- Thus the stopping is defined by Xt = rt, and the point on 
the boundary has the smallest t-th coordinate among the points in A. We set st 
to this point. 

Then for j ^ t, we move the the region starting at its current position 
down the j-th coordinate direction by reducing rj until the first point hits the 
boundary. The boundary we hit is defined by Xt — rt = Xj — rj, and the point 
that hits it first has the minimum Xj — Xt + Stt among the points in A. Observe 
that the point St remains on the boundary Xt = rt, and therefore we stop before 
rj becomes negative. When we stop, we take a point that hits the boundary and 
assign Sj to it. 

Consider the set of points in the signature, Sa = {si, ■ • ■ , Sm}- Define R to 
be the region we got at the end of the procedure that computed Sa- Ris defined 
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by 

R = {x •. Xt > Stt ^ Xt - Stt > Xj - Sjj Vj ^ t}. 

It follows that R can be constructed directly from Sa- 

Figure El shows the signatures we get from the prices ri and T 2 - The points 
on the boundary of the shaded region are the signature of that region. Note, 
for example, that there are no points in R\ that are not inside the boundary 
induced by the signature for Ri. 




Fig. 2. Signatures in a two item auction 



Suppose two feasible sets have the same signature S and let R be the region 
defined by the signature. Then the two sets are exactly the set of points in R, 
and are thus identical. 

The next two lemmas are simple, so we omit the proofs. 

Lemma 5. For each t, 1 <t <m, there are at most n™ t-feasible sets. 

Lemma 6. For every t-feasihle subset C of the sample S there is a t-feasible 
subset A of the input such that C = An S . 

For fc > 1 and 0 < ^ < 1, we say that a sample S is (fc, S) -balanced if for every 
1 < t < TO and for every t-feasible subset of the input. A, such that |A| > k, we 
have 

il-6)<{\AnS\)/i\AnN\)<l/{l-S). 

Lemma 7. The probability that a sample containing half of the input points is 
{k, 5) -balanced is at least 1 — 2TOn"* exp(— fc<5^/8). 
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Proof. Lemma n implies that the probability that for a set A with |A| > k, 
Pr[\A n S'! < (1 - <5)|A n N\] < exp(-/c5V8) 

and 

Pr[|A n 1V| < (1 - 5)|A n 51] < exp(-MV8). 

Note that the fact that the number of sample points in one subset is close 
to its expectation makes it no less likely that the number of sample points in 
another subset is close to expectation. Thus the conditioning we get is favorable. 
By Lemma 1^ there are at most n"* t-feasible subsets for every t, so the total 
number of feasible subsets is mn™. These observations imply the lemma. ■ 

Theorem 2. Assume ahm^ Inn < T and m > 2. Then TZ > lF/24 with proba- 
bility of at least 1 — exp(— a/1728) (for some constant a > IJ. 

Proof. Consider Lemma 0 with ^ = 1/2 and k = amlogn/12. The probability 
that the sample is (fc, (5)-balanced is 

1 — 2r?w™ exp(— fcJ^/8) = 1 — 277171"* exp(— am log n/864) > 1 — exp(— a/1728) 

for m > 2. For the rest of the proof we assume that the sample is (fc, 5)-balanced; 
we call this the balanced sample assumption. 

Next we show that the revenue of the auction on the sample, J-g, satisfies 
iFs > T jtS. Let Qi be the set of bidders who get item i when computing T on 
the entire bid set. Consider sets Qi containing less than (amlogn)/2 bidders. 
The total contribution of such sets to T is less then T f 2. This is because there 
are at most m such sets and each bid is at most h giving a maximum possible 
revenue of ahmf logn/2 = T j2. Thus the contribution of the sets with at least 
(amlogn)/2 bidders is more than lF/2, and we restrict our attention to such 
sets. By the balanced sample assumption, each such set contains at least 1/3 
sample points, and thus Tg > (l/3).5/2 = .5/6. 

Finally we show that 72. > 5/24 using a similar argument. Let Ri be the re- 
gions defined by the prices computed by the auction on the sample. Consider the 
regions containing less than (am log n)/12 sample points. The total contribution 
of such sets to the revenue is less then 5/12. The remaining regions contribute at 
least 5/12 (out of 5/6). Each remaining region contains at least (amlog77)/12 
sample points. By the balanced sample assumption, each such region contains 
at least one nonsample point for every two sample point, and thus 72 > 5/24. 



Lemma 0 and Theorem 0 imply that if the assumptions of the theorem hold, 
the random sampling auction is competitive. 

4.1 The Dual Price Auction 

The random sampling auction is wasteful in the sense that all bidders in the 
sample lose the auction. The dual price auction eliminates the waste by treating 
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S and N symmetrically: S is used to compute sale prices for N and vice versa. 
Note that for each item, the two sale prices used are, in general, different; this 
motivates the name of the auction. 

By symmetry, the expected revenue of the dual price auction is twice the 
expected revenue of the single price auction with n/2 sample size. Thus, under 
conditions of Theorem 13 the dual price auction is competitive. 

5 A Deterministic Auction 

The following auction is a generalization of the deterministic optimal threshold 
auction introduced in jOj to the multi-item case. Although not competitive in 
general, the single-item variant of this auction works well when the input is 
non-pathological, e.g., when bidder utilities are selected independently from the 
same distribution. 

The deterministic auction determines what item, if any, the bidder i gets 
as follows. It deletes i from B, computes optimal prices for the remaining bid- 
ders, and then chooses the most profitable item for i under these prices. This 
is done independently for each bidder. This auction is truthful but, as we have 
mentioned, not competitive in some cases. 

6 Concluding Remarks 

Our analysis of the random sampling auction is somewhat brute- force, and a 
more careful analysis may lead to better results, both in terms of constants 
and in terms of asymptotic bounds. In particular, the assumption ahm?\n.n < 
T in Theorem |3 may be stronger than necessary. One can prove that Tg = 
I2(iF) assuming ahm < T . We wonder if the theorem holds under this weaker 
assumption. 

Although our theoretical bounds require m to be small compared to n and the 
optimal fixed price solution to contain a large number of items, it is likely that 
in practice our auctions will work well for moderately large m and moderately 
small optimal fixed price solutions. This is because our analysis is for the worst- 
case. In many real-life applications, bidder utilities for the same item are closely 
correlated and our auctions perform better. 

The optimal fixed pricing problem has a very special form that may allow 
one to solve this problem efficiently. Note that if one uses an approximation 
algorithm to solve the problem (say within 2% of the optimal) and our auctions 
remain truthful. (This is in contrast to combinatorial auctions jH|.) It is possible 
that in practice this problem can be solved approximately, in reasonable time, 
using general nonlinear optimization techniques. We leave an existence of such 
an algorithm as an open problem. 

Another open problem is a generalization of our results. One possible general- 
ization is to the case when some items are in fixed supply. Another generalization 
is to the case when consumer i wants up to ki items. 
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Abstract. Content-Based Multicast is a type of multicast where the 
source sends a set of different classes of information and not all the sub- 
scribers in the multicast group need all the information. Use of filtering 
publish-subscribe agents on the intermediate nodes was suggested Q to 
filter out the unnecessary information on the multicast tree. However, 
filters have their own drawbacks like processing delays and infrastruc- 
ture cost. Hence, it is desired to place these filters most efficiently. An 
O(n^) dynamic programming algorithm was proposed to calculate the 
best locations for filters that would minimize overall delays in the net- 
work 13 . We propose an improvement of this algorithm which exploits 
the geometry of piecewise linear functions and fast merging of sorted 
lists, represented by height balanced search trees, to achieve 0(n log n) 
time complexity. Also, we show an improvement of this algorithm which 
runs in 0(n log /i) time, where h is the height of the multicast tree. This 
problem is closely related to p-median and uncapacitated facility loca- 
tion over trees. Theoretically this is an uncapacitated analogue of the 
p- inmedian problem on trees as defined in |3. 



1 Introduction 

There has been a surge of interest in the delivery of personalized information 
to users as the amount of information readily available from sources like the 
WWW increases. When the number of information recipients is large and there is 
sufficient commonality in their interests, it is worthwhile to use multicast rather 
than unicast to deliver information. But, if the interests of recipients are not 
sufficiently common, there could be huge redundancy in traditional IP multicast. 
As the solution to this Content- Based Multicast (CBM) was proposed [bibj where 
extra content filtering is performed at the interior nodes of the multicast tree so 
as to reduce network bandwidth usage and delivery delay. This kind of filtering 
is performed either at the IP level or, more likely, at the software level e.g. in 
applications such as publish-subscribe |2| and event-notification systems P|. 

Essentially, CBM reduces network bandwidth and recipient computation at 
the cost of increased computation in the network. CBM at the application level 
is increasingly important, as the quantity and diversity of information being 
disseminated in information systems and networks like the Internet increases. 
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and users suffering from information overload desire personalized information. 
A form of CBM is also useful at the middleware level m and network signaling 
level Previous work applies CBM to address issues in diverse areas mm . 

Q addresses the problem of providing an efficient matching algorithm suit- 
able for a content based subscription system. j5j addresses the problem of match- 
ing the information being multicast with that being desired by leaves. 0 pro- 
poses mobile filtering agents to perform filtering in CBM framework. They con- 
sider four main components of these systems: subscription processing, matching, 
filtering and efficiently moving the filtering agents within the multicast tree. 

0 evaluates the situations in which CBM is worthwhile. It assumes that the 
multicast tree has been set up using appropriate methods, and concentrates on 
efficiently placing the filters within that multicast tree. It also gives the mathe- 
matical modeling of optimization framework. The problem considered is that of 
placing the filters under two criteria : 

— Minimize the total bandwidth utilization in the multicast tree, with the re- 
striction that at most p filters are allowed to be placed in the tree. This is 
similar to p-median problem on trees. An optimum 0{pn?) dynamic pro- 
gramming algorithm was described. 

— Minimize total delivery delay over the network, with no restriction on num- 
ber of filters, assuming that the filters introduce their own delays F and the 
delay on the link of multicast tree is proportional to the amount of traffic 
on that particular link. That means although filters have their own delays, 
they could effectively reduce traffic and hence delays. This problem is simi- 
lar to uncapacitated facility location on trees. An optimum O(n^) dynamic 
programming algorithm was described. 

In this paper, we consider the second formulation, (minimizing delay) and 
show that the complexity of the dynamic programming algorithm can be im- 
proved. We do not concern ourselves with the construction of the multicast tree, 
or with processing of the subscriptions. Also, we assume that minimum required 
amount of traffic at each node of multicast tree is known. We consider this to 
be a part of subscription processing. This could be done by taking of unions of 
subscription list bottom up on the multicast tree, or by probabilistic estimation. 
Given these we focus on the question of where to place filters to minimize the 
total delay. We also assume that the filters do as much possible filtering as they 
could and do not allow any extra information traffic on subsequent links. 

In section 0 , we show the formulation of the problem. We discuss the pre- 
liminaries and assumptions. We go over the dynamic programming algorithm 
described in ^ and give the intuition which motivates faster algorithms. In sec- 
tions El and 0 we describe two algorithms, the first being an improvement of 
the dynamic programming and the second being an improvement of the first. 
We also include the analysis of their running time . In section 0 we describe 
the piecewise linear functions data structure and the required operations on it 
along with the complexity bounds. In section 0 we consider further extensions 
and related problems. 
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2 Preliminaries 

2.1 Notations 

A filter placement on a rooted multicast tree M = (V, E) with vertex set V and 
edge set A C F X y is a set ^ C R where filters are placed at all vertices in S 
and on no vertex in V — S. Let \V\ = N, and so \E\ = N — 1. We denote the root 
of M by r. Tree{v) denotes the subtree rooted at vertex v € V. For example, 
Tree{r) = M. Let us denote the height of tree M by H. 

For simplicity of writing, we will use some functional notations. We de- 
note size of Tree{v), that is the number of vertices in Tree{v), by n(v). Thus, 
|Tree(r)| = n(r) = N. c{v) denotes the number of children of vertex v € V, 
while s(v) denotes the number of leaves in Tree(v). For example, c(v) = 0 and 
s(w) = 1 if vertex is a leaf in M. 

f{v) is the total size of information requested in Tree{v). For a leaf v G V, 
f{v) denotes the size of the information requested from that user. In other words, 
f{v) is also the amount information that node v gets from its parent, if the parent 
has a filter. We assume that f{v) for each v is known. 

2.2 Assumptions 

We make the following assumptions in our model. 

— The delay on a link is proportional to the length of the message transmitted 
across the link, ignoring propagation delay. Thus if m is the length of the 
message going across a link (or an edge), then the delay on that link is mL 
units, where the link delay per unit of data is L, a constant. 

— The delay introduced by an active filter is a constant. We denote it by E. It 
is a (typically) a big constant Q 

— Each internal vertex of M waits and collects all incoming information before 
forwarding it to its children. But this time is much smaller than the delay 
rate over the link. 



2.3 Recurrence and Dynamic Programming 

Our objective is to minimize the average delay from the instant the source mul- 
ticasts the information to the instant that a leaf receives it. Since number of 
leaves is a constant for a given multicast tree M, we can think of minimizing the 
total delay, where total is made over all leaves in M. 

Let A{v) stand for the lowest ancestor of v whose parent has a filter. For 
example, A(v) = v if parent of v has a filter. 

Now consider a CBM with a required flow / known for each vertex in the 
tree. For a vertex v, let D(v,p) denote the minimum total delay in Tree{v), 

^ It is not necessary to assume that L and F are same constants for each link and each 
filter locations, they could be different constants for different links and location as 
in general formulation of p-median problem(3. This will not change the algorithm. 
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assuming A{v) = p. Let vi,V 2 , ■ ■ ■ ,Vc(v) be children of vertex v. Let = 
s{v)F + /(fi)s('Ci) and Ey = Ls{v). Cy and Ey are constants and can 

be computed for each vertex v in 0{N) time by bottoms up calculation. 

Then the minimum total delay can be expressed by the following recurrence 
relation 



if u is a leaf then 
D{v,p) = 0 for all p 

else 

D(v,p) = min{ 

Cy + X)i=i D{vi, Vi), if V has a filter 
f{p).Ey + D{vi,p), otherwise 

} 

end if 



The optimal placement can be found using dynamic programming as noted 
in 1^. But a naive implementation of it would take time 0{NH). 

We will “undiscretize” the above recurrence relation and write it as a function 
of a real number p, which is now the incoming information flow into v. To make 
it clear that this function is specific for a vertex v, we denote it as Dy. Now our 
recurrence relations takes a new look 



if is a leaf then 

Dy{p) = 0 for all p 

else 

Dy{p) = min{ 

Cy + Yh=i Dviifivi)), if V has a filter 
p.Ey + Y)i=i Dy^ip), otherwise 

} 

end if 



Notice that we can still compute D{v,p) by plugging the discrete value f{p) 
in Dy. Intuitively, Dy is a function of real value p which is incoming flow to 
the Tree{v). It is a piecewise linear non-decreasing function. Each break point 
in the function indicates a change in the arrangement of filters in the subtree 
Tree{v). This change occurs in order to reduce the rate of increase of Dy (slope) 
for higher values of p. The slope of each segment is lesser than the previous, and 
the slope of final segment (infinite ray) is zero because this would correspond to 
filter at v. Once, a filter is placed at v, the value of variable p no longer matters. 
Therefore, Dy is a piecewise linear non-decreasing concave function. We will use 
\Dy\ notation to denote the number of break-points (or number of linear pieces) 

in Dy. 

The big advantage of above formulation is that it allows us to store Dy as 
a height balanced binary search tree which in turn allows efficient probing and 
merging, so that we can implement above recurrence and find optimal filter 
placements in quicker time. 
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Before we proceed with actual description of algorithms and data-structures, 
we present two simple lemmas which prove useful properties of Dy as claimed 
above. 

Lemma 1. Dy is a piecewise linear non- decreasing function and 3py,ty such 
that Dy{p) = ty for all p > py. 

Proof : By induction on heighiQ of v. Claim is trivially true for a leaf v. Dy = 0. 
So = 0 and ty = 0. Let’s assume the claim is true for all c(u) children of v. 
Also let Ly{p) = p.Ey. Ly is a line passing through origin. Hence, it is a piecewise 
linear non-decreasing function. 

Let Wy{p) = Cy J2i=i Dy^{f{vi)). Wy ts & constant and hence a piece- 
wise linear non-decreasing function. Let Fy{p) = Ly{p) ^vi{p)- Hence, 

Fy is a piecewise linear non-decreasing function. Dy{p) = min{Wy{p),Fy{p)'\. 
Therefore, Dy is a piecewise linear non-decreasing function because minimum 
preserves piecewise linear non-decreasing property, ty = Wy(p) and py is the 
value of p where Wy and Fy intersect . □ 




Lemma 2. \Dy \ < n{v). 

Proof : By induction on height of v. Claim is trivially true for v if its a leaf, 
that is its height is 0. If claim were true for each of the c{v) children of v, then 
each of Dy. is a piecewise linear function made up of at most n{vi) different 
linear pieces. Dy is a minimum of sum total of Dy^ and a constant. It can have 
at most one more extra piece added to it. So the number of linear pieces in it 
cannot be more than 1 -|- J2i=i But that is precisely n{v). □ 

It is apparent from the above proof that each break-point in Dy is introduced 
by some node in Tree(v) as a result of “min” operation. 

3 Algorithm- 1 

We are now ready to present our first algorithm, Algorithm-1. Let I be the 
total amount of the incoming information at r. The function A(r) returns the 
piecewise linear function Dy at root r. Dy is stored as a balanced binary search 
tree whose size is equal to the number of break-points in Dy . 

Height of w = 1-t max{ Height of u | u is a child of ii}. Height of a leaf is 0. 



2 
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3.1 Algorithm 

Algorithm-1 { 

A(r); 

M-DFS(r, /); 

} 

M-DFS {v, p) { 
if c{v) == 0 then 
return; 

end if 

i{ p > Py then 

place filter at v; 
for f = 1 to c{v) do 
M-DFS(u„ /(u,)); 

end for 
else 

for f = 1 to c{v) do 
M-DFS(ui, p); 

end for 
end if 

return; 



A{v){ 

if c{v) == 0 then 
Py = -hoo; 
return create(0,0); 
else 

for z = 1 to c{v) do 
= A{vi); 

end for 

ty Cvt 

for z = 1 to c{v) do 

ty = ty+pvohe{qiJ{vi))] 

end for 

2 = create(£l„, 0); 
for z = 1 to c{v) do 
2 = add_merge(z, 

end for 

Py = truncate(2:, t^); 
return z] 

end if 

} 



3.2 Data Structure Operations 

The data structure supports the following operations: 

create{a, b): Returns a new function with equation y = ax + b in time 0 ( 1 ). 
probe{q,t)- returns q{t) in 0(log|<7|) time. 

addjmerge{qi,q2). Returns a piecewise linear function which is the sum of q\ 
and (72. Assuming without loss of generality |(7i| > |(72| > 2 , the running time 
is 0(|(72| log( )). qi and (72 are destroyed during this operation and the 

new function has size |gi| -I- \q2\- 

truncate{q,t)- This assumes that some z s.t. q{z) = t exists. Modifies (7 to a 
function q' which is equal to q{x) for x < z, and t for x > z. This destroys q. 
It returns z. q' has at most one more breakpoint than q. All the breakpoints 
in q after z are deleted (except at -l-oo). The running time is 0 (log |(7|) for 
search plus time 0(log \ q\) per each deletion. 

3.3 Analysis of Algorithm-1 

Algorithm - 1 first recursively builds up the piecewise linear function Dy , bottom 
up, by calling A(r). It uses py values stored for each v in the tree and runs 
simple linear time Depth-First-Search algorithm to decide filter placement at 
each vertex of the tree. 

We will now show that the total running time of algorithm A, and therefore 
Algorithm- 1 , is 0 {N log N). There are three main operations which constitute 
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the running time: probe, truncate and add-merge. Over the entire algorithm, 
we do N probes each costing time < log TV because each probe is nothing but 
a search in a binary search tree, truncate involves N search operations and 

< N deletions (because each break-point is deleted only once and there is only 
one break-point each node in the multicast tree can introduce) each costing 

< logA^ time. Therefore, truncate and probe cost 0{N log N) time over the 
entire algorithm. We still need to show that total time taken by all the merge 
operations is 0{N log N). The following lemma proves this. 

Lemma 3. Total cost of merge in calculation of Dy is at most n(v)logn{v). 

Proof : We proceed by induction on height of v. The claim is trivially true 
for all leaf nodes since there are no merge operations to be done. At any in- 
ternal node V, to obtain Dy we merge Dy^^, Dy^, . . . , Dy^^^^ sequentially. Let 

Si = Now, again by induction (new induction on the number of 

children of v) assume that the time to obtain the merged function of first 
i Dyds is Si log Si. The base case when j = 1 is true by induction (pre- 
vious induction). Then, assuming without loss of generality Si > n(ui+i), 
the total time to obtain merged function of first i + 1 Dy^’s is at most 
Si log Si -I- n(ui+i) logn(ui+i) -I- n(u*+i) log ((si -I- n(uj+i))/n(u*+i)) which is at 
most Si+ilogSi+i. Therefore time taken to merge all the children at v and ob- 
tain Dy is at most n(u)logn(u). □ 



4 Algorithm-2 

4.1 Motivation 

We observe that lemma 0 suggests a bound of n{v) on the number of different 
linear pieces in Dy. On the other hand, we need to probe and evaluate Dy at 
at most H different values (that is the number of ancestors v can have !). This 
suggests that we can gain more if we “convert” our functions which grow “bigger” 
and have more than H breakpoints and reduce them to at most H breakpoints. 

For example, consider the case of multicast tree M being a balanced binary 
tree. Let Y be the set of nodes at depth log log N . For each v G Y the subtree size 
n{v) is roughly logA^. |y| is roughly N/logN and computing Dy at each such 
V takes log log log time. This makes it A^ log log A^ over all v GY. Now, we 
convert Dy into array form as in dynamic programming and resume the previous 
dynamic programming algorithm in jO] . This dynamic programming calculations 
occur at roughly N/ log N nodes each taking log N {H = log N) time. Hence we 
achieve an enhancement in total running time, taking it down to N log log N 
which is essentially NlogH. However, to achieve this in general case, we still 
have to stick to the binary search tree representation in the second phase. The 
advantage is that the size of the binary search tree never grows more than H . 
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4.2 Data Structure Operations 

Before we proceed with this new algorithm, we consider “converted” functions, 
since some of the functions would be evaluated only for a small number of 
values. A “converted” function qx{x) for q{x) with respect to (sorted) set 
X = xi,X2, is a piecewise linear function such that q{xi) = qx{xi) for 

Xi G X, and piecewise linear in between those values. We define the following 
operations: 

convert{q, X): Returns the “converted” function qx in 0 {klog{\q\/k)). Assumes 
X is sorted. 

add-dissolve{qx , g)' Adds function g to the converted function qx-, and returns 
the resulting function dissolved w.r.t. set X. Running time : 0 {\g\ log Igjcl) 
add-Collide{qxi, gx2)- Adds two converted functions qxi and gx2- Creates new 
converted function only on Xi f]X 2 . 

truncate-Converted{fx,t): Almost the same as truncate. Does not cause any 
deletions. It uses some “mark” to keep track of invalid values in data struc- 
ture. Running time: 0 (log|/x|). 



4.3 Description of Algorithm 

Using this, we modify the implementation of Algorithm- 1 . We continue building 
Dy’s bottom up as in algorithm A(u). Suppose we are at some v whose all 
c{v) children, namely ui, U2, . . . , Uc(„), have less than H breakpoints in their 
respective data-structures. We now start building Dy by merging Dy^ , Dy^ , . . . 
one by one. Suppose after merging Dy^ through Dy. for first i children, we 
find that the number of breakpoints for function q constructed so far exceeds 
H (and it’s trivially less than 2 H), then we call function convert{f , X) . Here 
X is the sorted list of values f{p) of v’s ancestors. Note that X is very easy to 
maintain due to recursive top-down calls in A(u) and monotonicity of f{p) values 
along a path. Then, for the remaining children of v, we use add-dissolve{q, Dy.), 
where j G {i + 1 , ■ ■ ■ , c(u)}. Once, the function is “converted”, it always remains 
“converted” . For add operation involving one “converted” function qi and one 
“unconverted” function q2 we use add-dissolve{q\, q2) which runs in 0{q2\ogH) 
and for two “converted” functions qi, 92 we use add-Collide{qi, (72) which runs in 
0 {H) since the size of the bigger data structure is now restricted by H. 

Thus, we don’t store more than required information about Dy, while main- 
taining enough of it to calculate the required parts of Dy, where u is v’s parent. 

4.4 Analysis of Algorithm-2 

Let Y be the set of all nodes v G V such that we needed to use the convert for 
finding Dy. Use of convert function implies n{v) = |rree(u)| > H in the light 
of lemma m\Y\ < ^ because for any two u,v GY, Tree{u) f]Tree{v) = (j). 

Let W be union of Y and all ancestors of nodes in Y . Subgraph of M on set 
W forms an upper subtree of M. Let U be the set of children of nodes in Y . 
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^ indicates nodes where convert function is used 



For all nodes v G U, we will run the normal addjmerge procedure. Since final 
Dy ’s constructed for each v G U have sizes less than H, the total cost of building 
them would be \U\logH < NlogH. 

For each node f £ y, we do a few addjmerge operations, followed by a 
convert operation, followed by few add-dissolve and add-collide operations. Let 
X be the sorted set of f{p) values for the ancestors of v. 1 < \X\ < H. Since the 
overall effect of addjmerge operations leads to a data-structure of size at most 
2H, total cost is 0{HlogH). convert will cost at most \X\ log ^ = 0{H). 

If we sum over all add -dissolves performed during the run of the algorithm, 
it is easy to see that at most N breakpoints will be dissolved in data-structure 
of size H. So the total cost of add-dissolves is at most NlogH. 

Further, there are at most N/H “converted” functions, each add-collide takes 
0{H) time and causes one less “converted” function. Hence, the total cost of 
add-collide is 0{N). 

Thus, the overall cost is at most N log H + ^ ■ {H log H + H) + N log H + N, 
that is 0{N log H). 

Theorem 1. Given the multicast tree M having N nodes and height H with 
source at root r disseminating I amount of information, along with values f{v) 
which is the minimum amount of information required at node v of the multicast 
tree, the placement of filters in the multicast tree to minimize total delay can he 
computed in 0{N log H) time. 

5 Data-Structures 

In the previous sections, we have assumed the existence of a data structure to 
maintain non-decreasing piecewise-linear functions. Here, we describe the data 
structure along with the implementation of the operations. 

The data structure will maintain the breakpoints (or each value in X for 
a converted function) sorted by x coordinate in an AVL tree |l 1112] . An AVL 
tree is a balanced binary search tree in which for any node, the height difference 
between its left and right subtrees is at most one. Along with the x coordinate 
of the breakpoint, each node will also contain two real numbers a and b such 
that the linear segment to the left of the breakpoint is of equation y = Ax + B 
where A (resp. B) is the sum of all the a (resp. b) values on the path from the 
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node to the root of the tree. A dummy breakpoint at a; = -|-oo will be included 
in the tree to encode the rightmost linear piece of the function. 

Each node will also contain a mark for handling truncated parts of a function. 
A node is invalid (i.e. its a and b values are not correct) if itself or a node on 
the path to the root is marked. The linear function at the x value of an invalid 
node will be the same as the function of the first valid node that appears after 
it in the tree inorder. The node at cc = -l-oo will always be valid. Every time we 
visit a marked node during any of the operations, we unmark it, correct its a 
and b values and mark its two children. This ensures that the only invalid nodes 
are the ones the algorithm doesn’t see. This marking scheme will be necessary 
to implement truncate-converted which is “truncate” on “converted” functions, 
since we cannot delete the nodes in that case. 




Sorted lists represented as height-balanced trees Merging by sequential insertions (square nodes have been inserted) 



The data structure will use the AVL tree merging algorithm of Brown and 
Tarjan m to implement add_merge, convert and add_dissolve. Given two AVL 
trees Ti with n elements and T 2 with m elements, m < n, we will search/insert 
the elements of T 2 into Ti in sorted order, but instead of performing each search 
from the root of the tree, we start from the last searched node, climb up to the 
first parent (LC A) having the next element to search in its subtree, and continue 
searching down the tree from there. Brown and Tarjan show that the total 
number of operations performed during this procedure is 0{mlog{{n + m)/m)). 
This method can also be used to search or visit m sorted values in Ti within 
0{m\og{n/in) time. 

In order to add two functions, while merging the corresponding AVL trees 
using Brown and Tarjan’s method, and we will need to update the a and b values 
of the nodes. First, when we insert a new node in the tree, we find its “inherited” 
A and B values and adjust its a and b values accordingly. Then we consider the 
effect of adding the linear piece y = ax + (i to its right in the previous data 
structure where it came from. This can be done along the same walk in the 
tree. While walking in the tree from an element u to the next element to be 
inserted v, we will need to add the piecewise linear function joining them, say 
ax + j3 to all the nodes between u and v. To do that, add a and (3 to the a 
and b values of the least common ancestor (LCA) of u and v. Now, the function 
values for all the nodes between u and v have been increased correctly, but some 
nodes outside of that range might have been increased as well. To correct that. 
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we walk down from the LCA to u. This is a series of right child and left child 
choices, the first being left. In this series, whenever we choose right child after 
some (non-empty) sequence of left child choices, we subtract tuple (a, (3) at that 
node. Similarly, whenever we choose left child after (non-empty) sequence of 
right child choices, we add tuple (a,/3) to the node where choice is made. Also, 
similarly (vice-versa) a and b values can be adjusted along the path LCA to v. 
Thus, updates are only required along the Brown and Tarjan’s search path. To 
complete the argument, it can be verified that the validity of the a and b values 
of the nodes can be preserved during a rotations and double rotations in the tree 
for the AVL insertion. The figure illustrates the insert path along with updates 
due to linear segment ax + (3 between inserted points F and J. 

We now outline the workings of the different operations: 

create{a, b): Create a new AVL tree with 1 node at a; = -l-oo, and set its a and 
b values. 

addjmerge{fi, / 2 ): Use the method of Brown and Tarjan as described above. 
truncate{f , t): Find z such that f{z) = thy performing a search in the tree. As 
we go down the tree, we maintain A and B, the sum of the a and b values 
of all the ancestors of the current node. This way, we can compute the value 
of f{x) for the x value of each of the nodes visited. Since the function / is 
non-decreasing, the tree is also a binary search tree for the f{x) values. Once 
the search is done, we find the linear segment for which Az + B = t, and 
thus find z. We then insert a new breakpoint in the tree at a; = z, and delete 
all the break-points in / which come after z (except the one at -l-oo) one- 
by-one, using usual AVL-tree deletion. Add the line segment (0,t) between 
z and -l-oo. 

probe{f,t)- Search in the tree the successor for t in the x values. Compute the 
A and B sums on the path, returns f{t) = At + B. 
convert{f, X): Use the method of Brown and Tarjan to find the successors of 
all Xj € A in 0{klog{n/k)). Evaluate f{xi) at each of those values, and 
construct a new AVL tree for a piecewise linear function with Xi values as 
breakpoints, and joining each adjacent breakpoints with an ad-hoc linear 
function. 

add-dissolve{fx,g)' Just like in add-merge, but we do not insert the break- 
points, we just update the a and b values of the existing breakpoints. 
add-Collide{fxi, gx2)- Find the values of / and g on Xi f]X 2 and construct a 
new AVL tree as in convert. 

truncate-Converted{f X , t): As in truncate, we find z. But in this case we do not 
insert the new break-point. Also we do not delete the break-points in fx 
after z. We invalidate the (a, b) values of the remaining points by marking 
the right child of the nodes traversed from the left and also we adjust (a, b) 
value at these nodes so that the linear function reflects the line y = t, while 
walking up from the position of z to the root. Once at the root, we walk 
down to -l-oo, validating the a and b values on that path. We then set the a 
and b values at -l-oo such that A = 0 and B = t. It returns z. 




Algorithms for EfRcient Filtering in Content-Based Multicast 439 



6 Related Problems and Future Work 

Future work in this area is to design an algorithm based on similar methods for 
the first model of p]. Here, the dynamic programming algorithm runs in 0{pn^) 
time, since the objective function at each node involves two parameters. This 
model, where only p filters are allowed, is a variant of p-median on tree where all 
the links are directed away from the root. The problem is called p-inmedians in 

0.When p > n it reduces to our problem. The dynamic programming optimal 
algorithm known for the uncapacitated facility location (which is p-median with 
p > n) over trees takes 0{n?) time. Interesting future work is to find faster 
algorithms for uncapacitated facility location 0 and p-median 0. That could 
also give us faster algorithms for p- forest 0 and tree partitioning problems |D|. 

Acknowledgments. We would like to thank John lacono and Michael Fredman 
for useful suggestions regarding choice of data structures and Farooq Anjum and 
Ravi Jain for providing useful background on the problem. 
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Abstract. We give a polynomial-time O ( iJg ppy )-approximation algo- 
rithm for minimum-time broadcast and minimum-time multicast in 
n-node networks under the single-port vertex-disjoint paths mode. This 
improves a previous approximation algorithm by Kortsarz and Peleg. In 
contrast, we give an f?(logn) lower bound for the approximation ratio of 
the minimum-time multicast problem in directed networks. This lower 
bound holds unless NP C DTlME(n'°^*°®"). An important consequence 
of this latter result is that the Steiner version of the Minimum Degree 
Spanning Tree (MDST) problem in digraphs cannot be approximated 
within a constant ratio, as opposed to the undirected version. Finally, 
we give a polynomial-time 0(l)-approximation algorithm for minimum- 
time gossip (i.e., all-to-all broadcast). 

Keywords: Approximation Algorithms, Graph and Network Algo- 
rithms, Broadcasting, Multicasting, Gossiping, Minimum Degree Span- 
ning Tree. 



1 Introduction 

Given a node s of a network, and a set of nodes D, multicasting from s to D 
consists to transmit a piece of information from s to all nodes in D using the 
communication facilities of the network. Broadcasting is the particular case in 
which the destination-set is composed of all nodes. Given a set of nodes D, gos- 
siping in D consists to perform multicast from s to D, for every node s G D. 
These communication patterns are basic operations upon which network applica- 
tions are frequently based, and they hence gave rise to a vast literature, covering 
both applied and fundamental aspects of the problem (cf. |lt)l25j and US], re- 
spectively) . 

A standard communication model assumes the network to be a connected 
graph G — (V,E). Transmissions proceed by synchronous calls between the 
nodes of the network. It is generally assumed that (1) a call involves exactly 
two nodes, (2) a node can participate to at most one call at a time (single-port 
constraint), and (3) the duration of a call is 1 (assuming that the two nodes 
involved in the call exchange a constant amount of information). Two main 
variants of this model have then been investigated: the local model and the line 
model. The former states that calls can be placed between neighboring nodes 
only, whereas the latter allows calls to be placed between non-neighboring nodes 
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(i.e., a call is a path in the graph G, whose two extremities are the “caller” 
and the “callee” ) . The local model aims to model switching technologies such as 
store-and-forward, whereas the line model aims to model “distance-insensitive” 
switching technologies such as circuit-switching, wormhole, single-hop WDM in 
optical networks, and virtual paths in ATM networks. For both variants, the 
notion of time is captured by counting communication rounds, where round t is 
defined as the set of all calls performed between time t—1 and time t, t = 1 , 2 ,... 
A multicast (resp., broadcast, gossip) protocol is simply described by the list of 
calls performed in the graph to complete multicast (resp., broadcast, gossip). 



Notation. We denote by (G, D) the minimum number of rounds required to 
perform multicast from s to Z? C t/ in G = {V, E). Similarly, we denote by hs(G) 
the minimum number of rounds required to perform broadcast from s in G, that 
is bs{G) = ms{G,V). Finally, we denote by g{G,D) the minimum number of 
rounds required to perform gossip among the nodes of D. li D = V, g{G, V) is 
simplified in g{G). 

Of course, these numbers depend on whether we consider the local or the 
line model. This will be specified later. 



Definition 1. The multicast problem is defined as follows. Given any graph 
G = (V,E), any souree-node s & V , and any destination- set D <ZV, eompute a 
multicast protocol from s to D performing in ms{G,D) rounds. The broadcast 
and gossip problems are defined similarly. 

These problems are inherently difficult, and only approximated solutions can 
be expected in polynomial time. 



Definition 2. An algorithm for the multicast problem is a /5-approximation al- 
gorithm if, for any instance (G, s, D) of the problem, it returns a multicast pro- 
tocol from s to D in G which completes in at most p ■ ms{G, D) rounds. Approx- 
imation algorithms for the broadcast and gossip problems are defined similarly. 



A large literature has been devoted to the description of broadcast and gos- 
sip protocols performing in a small number of rounds. Under the local model, 
many authors have considered specific topologies (see, e.g., |4II I SIlTT] ). but 
lots of efforts have also been made to derive approximation algorithms for ar- 
bitrary topologies Ilt!l24l27l28l . The line model has also been very much inves- 
tigated, in its two variants: the edge-disjoint paths mode j, 4141/18191 1 1 1 1 .'-il22l2ti] . 
and the vertex- disjoint paths mode I, 4lfil20l2l 1241241 . The former mode specifies 
that the paths joining the participants of simultaneous calls must be pairwise 
edge-disjoint, whereas the latter specifies that they must be pairwise vertex- 
disjoint. 

The vertex-disjoint paths mode is motivated by several studies which pointed 
out that avoiding node-congestion is a critical issue, especially in the context 
of several multicasts occurring simultaneously. The remaining of this paper is 
entirely dedicated to this latter communication mode. Let us hence recall the 
model so that no ambiguity should result from it. 



Definition 3. (Vertex- disjoint paths mode.) Communieations proeeed by se- 
quence of synchronous calls of duration 1. A call involves exactly two nodes. 
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the caller and the callee, and a unit piece of information can be transmitted 
from the caller to the callee during the call. The two participants of a call can 
be at distance greater than one in the network, i.e., a call is a path (non nec- 
essarily minimal) between the caller and the callee. Any two simultaneous calls 
(i.e., paths) must be vertex- disjoint. A round, i.e., a set of calls performed si- 
multaneously, is therefore a set of pairs of nodes that are matched by pairwise 
vertex-disjoint paths. 

Although the vertex-disjoint paths mode allows fast broadcast in some net- 
works (e.g., [log 2 n] rounds in the ring Cn), there are networks which require 
Q{n) rounds to perform broadcast (e.g., the star Sn of n nodes requires n — 1 
rounds). The vertex-disjoint paths mode can hence be very slow or very fast 
depending on the network, and this is true even if we restrict the study to spe- 
cific families of graphs such as trees or interval graphs. The multicast, broadcast 
and gossip problems are actually NP-complete in arbitrary graphs, and hence 
lots of efforts have been devoted to specific topologies (cf. EM]) and to trees 
(cf. ISP). For arbitrary graphs, lower bounds for the gossip problem if calls can 
transmit an unbounded number of pieces of information can be found in [2,4) . As 
far as the broadcast problem is concerned, Kortsarz and Peleg [23 have shown 
that there exists a polynomial-time O ( ip'^iog „ )-approximation algorithm. 

The main contributions of this paper are the following: 

1. We derive a polynomial-time 0{ (-approximation algorithm for the 

multicast problem. This improves the best previously known approximation 
algorithm for minimum-time broadcast El, and yields a constant approx- 
imation ratio for graphs of broadcast time OFT = e > 0. Our al- 

gorithm is based on an algorithm by Fiirer and Raghavachari HD for the 
Minimum Degree Spanning Tree (MDST) problem. 

2. We show that the nature of the multicast problem considerably changes from 
undirected to directed networks. In the latter case, we prove that, unless 
NP C DTiME(n*°8^°s"), optimal solutions of the multicast problem cannot 
be approximated in polynomial time within less than an I2(logn) multi- 
plicative factor. This result is obtained from an approximation threshold for 
the Minimum Set Cover problem due to Feige m. Nevertheless, we extend 
our approximation algorithm to directed graphs, though for the broadcast 
problem only. We note that this extension does not hold with the protocol 
in EH- 

3. Beside the study of information dissemination problems, a direct conse- 
quence of our lower bound on the approximation ratio for the multicast 
problem is that the optimal solution of the Steiner version of the MDST prob- 
lem in digraphs cannot be approximated in polynomial time within less than 
an C(logn) multiplicative factor, nor within less than an l7(lognloglogn) 
additive factor, again unless problems in NP can be solved in slightly more 
than polynomial time. 

4. Finally, we show that the minimum gossip time can be approximated within 
a constant multiplicative factor in undirected networks. 

Section E presents our 0( (-approximation algorithm for the multi- 

cast problem. Section E revisits the multicast problem in directed networks, an 
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includes the lower bounds on the approximation ratio of the MDST problem. 
Section 0 presents our 0(l)-approximation algorithm for the gossip problem. 
Finally, Section 0 contains some concluding remarks. 



2 Approximation Algorithm 



Let us start by computing a lower bound on the number of rounds required for 
multicasting. 



Notation. Given any graph G = (V,E), and any set D CV, let Z\min(G, D) be 
the smallest integer k such that there exists a tree of maximum degree k, span- 
ning D in G.lf D = V, then Ayain{G,D) is simplified in Z\„iin(G). Z\min(G', 13) 
is abbreviated in Z\min if no confusion can arise from this simplification. 

In the remaining of the paper, we assume, w.l.o.g., that s G D. 

Lemma 1. Let G = (V, E), s G V, and D CV . We have 
m,s{G,D) > max{[log 2 |I3|], |'Z\i„in/2l }. 

Proof. The single-port constraint implies that at least [log 2 \D\~\ rounds are re- 
quired because the maximum number of informed nodes can at most double at 
each round. Let M be an optimal multicast protocol from s to 13 in G, that 
is a protocol performing in ms{G,D) rounds. Let H be the graph induced by 
the union of all the calls, i.e., all the paths, of Ni. H is a, connected subgraph 
of G which spans D. Let A be the maximum degree of the nodes in H. The 
vertex-disjoint constraint implies that A4 completes in at least [L\/2] rounds. 
Therefore, since Zimin < A, Ai completes in at least [zimin/2] rounds. □ 

It is trivial observation that multicasting in Hamiltonian graphs can be done 
in [log 2 |I3|] rounds by just using an Hamiltonian cycle. Actually, multicasting 
can also be achieved in [log 2 |L3|] rounds if there is an Hamiltonian partial sub- 
graph of G spanning D. Our approximation algorithm for the multicast problem 
is based on approximated solutions for the Minimum Degree Spanning Tree prob- 
lem (MDST for short) defined as follows. We are given a graph G = {V, E), and 
we are looking for a spanning tree whose maximum degree is Z\min(G). In the 
Steiner version of the problem, we are given a graph G = {V,E), and a vertex- 
set D Q V, and we are looking for a tree spanning D whose maximum degree 
is Z\min(G’; -D). A (p, r)-approximation algorithm for the MDST problem is an 
algorithm which, given G and D, returns a tree spanning D and of maximum 
degree at most p ■ Amin(G, D) + r. 

Lemma 2. (Fiirer and Raghavachari [17|1 There exists a polynomial-time (1, 1)- 
approximation algorithm for the Steiner version of the MDST problem in graphs. 



Lemma 3. Let t{n) be the eomplexity of an (1, 1) -approximation algorithm for 
the Steiner version of the MDST problem in n-node graphs. For any m, 2 < m < 
n, there exists a t{n)-time algorithm which, for any n-node graph G = (V,E), 
any set D Q V , and any node s G V , returns a multicast protocol from s to D 



which performs in 0^(m -h log |I3| 







rounds. 
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Proof. Let G be an n-node graph, and 2 < m < n. Let T be a tree of degree 
4imin + 1 Spanning D in G. Consider T as rooted at s. The proof simply consists 
to show how to construct a multicast protocol from s to D in T which performs 

in o(^{m + log \D\ + 4^ min) ) rounds. We look for subtrees of T containing 

at most \D\/m destination nodes. For that purpose, we perform a depth-first 
search (DFS) traversal of T, starting from s, with the following restriction: we 
visit a child of the current node x if and only if the subtree of T, rooted at 
X, contains at least \D\/m destination nodes. The visited nodes form a subtree 
T' of T. The leaves of T' are roots of subtrees of T containing at least \D\/m 
destination nodes. Let T” be the subtree of T composed of T' plus all the children 
of the nodes in T' . We describe a 3-phase multicast protocol from s to the leaves 
of r", performing in at most 0{m + log \D\ -|- zimin) rounds. 

Phase 1 is a multicast from s to all the leaves of T'. Node s proceeds sequen- 
tially by calling successively every leaf. The number of leaves of T' is at most 
m because every leaf is the root of a subtree of T of at least \D\/m destination 
nodes, and there are \D\ destination nodes in total. Therefore, Phase 1 takes at 
most m rounds. 

Phase 2 consists to inform specific nodes of T', from its leaves. For that 
purpose, observe that, given any tree, one can decompose the tree in a set V of 
pairwise vertex-disjoint paths so that every internal node is linked to exactly one 
leaf by exactly one path in V. This operation is called a path-decomposition of 
the tree. Given a path-decomposition V of T' , Phase 2 is performed as follows. 
For every leaf x, let be the path of V containing x. Each leaf x performs a 
multicast in P^ to all nodes of Px that either belong to D, or have at least one 
child in T” . Since there are at most \D\ such nodes along each path Px, Phase 2 
completes in at most [log 2 |D|] rounds. 

Phase 3 consists to inform leaves of T” from nodes of T' . For that purpose, 
every node of T' which is aware of the message after Phases 1 and 2 informs all 
its children in T". Since every node in T has a degree at most Z\min + 1, Phase 3 
takes at most Z\min + 1 rounds. 

Once this 3-phase multicast protocol is completed, we are let with multicast 
problems in subtrees of T containing at most \D\/m destination nodes. Multi- 
casts in these trees can be performed in parallel. The whole multicast protocol 
results of at most repetitions of this strategy. 

The construction of the protocol requires t(n) time to construct the tree T. 
We use 0{n) time to count the number of destination nodes in every subtree Tx of 
T, X £ V{T). The path-decomposition P can be constructed in 0{\V{T')\) time, 
and each multicast protocol along Px £ V can be computed in 0{\Px\) time. It 
results that, once T is set up, the whole multicast protocol can be computed 
in 0{n) time. So the total complexity of the construction is dominated by the 
extraction of the MDST of D in G. □ 

For D — V, hy choosing m = logn, lemmas 0 0 and 0 together show that there 
exists a polynomial-time O ( „ ) -approximation algorithm for the broadcast 
problem, as already shown by Kortsarz and Peleg m, using other techniques. 
For the multicast problem, m = log \D\ yields a polynomial-time 0{ logiog^^i )~ 
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approximation algorithm. More interestingly, Lemma |3 allows m to be tuned to 
get a smaller approximation ratio. 

Theorem 1. Lett{n) be the eomplexity of an {l,l)-approximation algorithm for 
Steiner version of the MOST problem in n-node graphs. There exists a t{n)-time 
log^oS" ) -approximation algorithm for the multicast problem. 

Proof Let G = {V,E), s €V, and D CV, and let OFT = ms{G,D). Applying 
Lemma 0 with m = log \D\ results in a multicast protocol Ad from s to I? in G 
which completes in time t = 0{ iogiog^j| • OFT). Thus 

1 logloglZ?! 

- t f f' ' < OFT < t 
c log|G| 

where c is a positive constant. There are two cases: 

- Case 1: t < c ■ In this case, log OPT = 0(loglog |P|), and thus M 

completes in at most ■ OFT) rounds. 

- Case 2: t > c io°^og p| ■ In this case, OFT > log^ \D\. Then we re-apply 

Lemma 01 with m = ^ t < OFT. The resulting multicast protocol Ad' 

completes in 

OFT log \D\ N 

Vlogt — log log \D\ + log log log \D\ — logc/ 

rounds. We have logt > log OPT, and thus Ad' completes in at most ■ 

OFT) rounds where 

R = log OFT — log log \D\ + log log log \D\ — log c. 

Since OFT > log^ |P|, I log OPT > loglog|P|, and thus R > ^ log OFT + 
log log log I P I — logc, that is P = 17 (log OPT). Therefore, Ad' completes in at 
most Q( log ■ OFT) rounds. □ 



Remark. From Lemma El t{n) can be chosen to be polynomial. 

We note that trees deserve a specific attention in the context of broadcast 
and multicast. Indeed, most of the multicast protocols for Internet are based on 
a tree connecting the destination nodes EHES- Unfortunately, we were not able 
to prove that the multicast problem in trees is NP-complete, nor to prove that 
it is in P. In particular, the merging method presented in m does not seem to 
apply in this context. However, one can prove the following: 

Theorem 2. There exists an 0{n\ogn)-time approximation algorithm for the 
multicast problem in trees. More precisely, the approximation algorithm returns 
a protocol that is optimal up to an additive factor o/ 2 |"log 2 |P|] • 

Due to lack of space, the proof is omitted, but can be found in (H- 
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3 Broadcast and Multicast in Directed Networks 

A natural question that arises in this context, is how far can we extend Theorem[n 
to directed networks. Surprisingly, we show that such an extension is not doable 
unless all problems in NP can be solved in time slightly larger than polynomial. 
The proof is based on an inapproximability threshold for the Minimum Set Cover 
problem (MSC for short). Let us recall that, in the MSC problem, we are given 
a collection C of subsets of a set S, and we are looking for the smallest set 
Cmin C C such that every element of S belongs to at least one set of Cmin- 
Feige m has shown the following: 

Lemma 4. (Feige |T2]) Unless NP C DTiME(n'°s*°®"), the optimal solution of 
the MSC problem is not approximahle in polynomial time within (1 — e) In [S'! for 
any e > 0. 

Theorem 3. Unless NP C DTiME(n^°s*°®"), the optimal solution of the mini- 
mum-time multicast problem in digraphs is not approximable in polynomial time 
within (1 — e) In \D\ for any e > 0. 

Proof. Let (S', C) be an instance of the Minimum Set Cover problem. Let G = 
(V,E) be the directed graph of 0{k ■ |C| • |S|) vertices obtained from (S, C) as 
follows {k will be specified later). G consists of k “branches” attached to a center 
node V. Each branch S is a copy of a graph of 0(|S| • |C|) vertices constructed 
as follows. Let B' = (V',E') where 

P' = {u} U C U S, 



and 

E' = {{v,C), Cg C}U{(C,s), GeC, seG}. 

B is obtained from B' by replacing, for every C G C, the star {((7,5), s G G} 
by a binomial tree of at most 2\G\ nodes, whose root is C, whose leaves are the 
nodes s £ C, and whose edges are directed from the root toward the leaves. 
(If \G\ is not a power of 2, then the binomial tree is pruned so that it contains 
\G\ leaves.) Since \G\ < [S'], B has at most (7(|S'| • |C|) vertices, and the whole 
digraph G has at most 0{k ■ 151 • |C|) vertices. The construction of the multicast 
instance is completed by setting the source-node to v and the destination-set D 
to the fc|5| nodes of the k copies of S (one copy in each branch). We have 

m„(G, i?) < fc|C,„i„| + [log2|5|l. (1) 

Indeed, given Cmin, the source v sequentially informs the nodes of Cmin in each of 
the k branches. This takes fc|Cmin| rounds. Once informed, every node G G Cmin 
multicasts to all nodes s G C in |"log 2 |C|] rounds using the binomial tree rooted 
at G. 

Let M be a p-approximation algorithm for the multicast problem. Applied on 
(G,v,D), A returns a multicast protocol M from v to D which performs in at 
most p ■ mv{G, D) rounds. Moreover, M determines a collection C{, . . . , CJ of 
subsets of C, one for each of the k branches: C* is defined as the subset of nodes 
in the ith copy of C which received a call, or which were traversed by a call. 
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during the execution of Ad. By construction, each C* covers S. Let C* be such 
that |C*| = mini<i<fe |C*|. The single-port constraint implies that Ad completes 
in at least fc|C*| rounds by definition of C*. Therefore, from Inequality ^ A 
computes C* with 

A:|C*|<p.(fc|C„i„| + [log2|^|l). 



Therefore, 



<„(l+ Pog.W1 ) 



In other words, for any e > 0, by choosing k ~ log |S'|/e, A allows to approximate 
|Cmin| within p(l -|- e). Lemma^lthen yields p> (1 — e) In jS”! for any e > 0. We 
have \D\ = /els'! -I- 1, and thus In \D\ = In [S'! -|- o(log [S'l), which completes the 
proof. □ 



Remark. One can check in that Feige’s Lemma still holds for instances 
{S,C) such that |C| < [S']. The proof of Theorem 0 hence yields an O(logn) 
lower bound for the approximation ratio of the multicast problem. 

Despite the negative result of Theorem 0 we extend Theorem 0 to directed 
graphs, though for the broadcast problem only. The MDST problem in directed 
graphs is defined as follows: we are given a digraph G = (V, E) and a node 
s G Id, and we are looking for a directed spanning tree rooted at s (i.e., whose 
edges are directed from the root towards the leaves) whose maximum degree is 
the smallest among all directed spanning trees rooted at s. The following lemma 
is a variant of Lemma 0 dedicated to directed graphs. 

Lemma 5. (Fiirer and Raghavachari [lfi|l There exists a polynomial-time 
(l,logn)-approximation algorithm for the MDST problem in digraphs. 

Note that this lemma does not apply to the Steiner version of the MDST 
problem. Nevertheless, it allows to prove the following: 

Theorem 4. Lett(n) he the complexity of an (l,log n) -approximation algorithm 
for the MDST problem in n-node directed graphs. There exists an t{n)-time 
approximation algorithm for the broadcast problem in directed graphs. 

Proof. First note that, using the same technique as in the proof of Lemma 0 we 
get that the broadcast time from any source s is at least [log 2 n ~\ , and at least the 
out-degree of a MDST rooted at s. The proof then follows the same guidelines 
as the proofs of Theorem 0 and Lemma 0 by replacing the use of Lemma 0by 
Lemma 0 The proof of Lemma 0must also be slightly modified because Phase 2 
involves calls from the leaves toward the root, that is calls proceeding upward 
the tree. We use the following result due to Kortsarz and Peleg: 

Lemma 6. (Kortsarz and Peleg |24|1 Given any directed tree T rooted at s and 
whose edges are directed from s toward the leaves, broadcasting from s to T 
requires at most 2L -\- [log 2 n] rounds, where L is the number of leaves of T. 
Moreover, a {2L -\- [log 2 n]) -round protocol can be computed in 0(n) time. 

Using that lemma, the proof of Theorem 0 works the same as the proofs of 
Lemma 0 and Theorem 0 □ 
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Fiirer and Raghavachari let as an open problem in ini the possible extension 
to directed graphs of their (1, l)-approximation algorithm for the Steiner version 
of the MDST problem in undirected graphs. Theorems Q and 0 show that the 
answer is negative. Indeed, we have: 

Corollary 1. Unless NP C DTiME(n^°®*°®”), any polynomial-time {l,r)-ap- 
proximation algorithm for the Steiner version of the MDST problem in directed 
graphs satisfies r = I2(log \D\ log log |I?|). 

Proof. Given a (l,a • log |T)|)-approximation algorithm for the Steiner version 
of the MDST problem in directed graphs, the construction of Theorem 0 with 
m = log \D\ yields a p-approximation algorithm for the multicast problem with 

P = o(^a- logfog^Li ) ■ From TheoremEI a> f?(log log |D|). □ 

The proof of Corollary 0 also shows that any polynomial-time p-approxima- 
tion algorithm for the Steiner version of the MDST problem in directed graphs 
satisfies p > J7(loglog |D|). The next result shows how to obtain a better lower 
bound with an ad hoe construction. 

Theorem 5. Unless NP C DTiME(n*°®^°®"'), the optimal solution of the Steiner 
version of the MDST problem in directed graphs is not approximable in polyno- 
mial time within (1 — e) In |D| for any e > 0. 

Proof. By simple reduction from MSC (see m for more details). □ 

Remark. Again, since Feige’s Lemma still holds for instances (5, C) such that 
|C| < IS”!, Theorem ^yields an f?(logn) lower bound for the approximation ratio 
of the Steiner version of the MDST problem in digraphs. 

Up to our knowledge, there is no polynomial-time 0(logn)-approximation 
algorithm for the Steiner version of the MDST problem in digraphs. 



4 Gossip Problem 

Trivially, we have g{G,D) > \D\ — 1 since every node must receive \D\ — 1 
information. The next lemma improves this lower bound. 

Lemma 7. Given G = {V,E) and D CV, g{G,D) > f |Z)|(Z\min — !)• 

Proof. Let V be an optimal gossip protocol in D. For every source s S D, let 
Gs be the subgraph of G induced by the union of all the calls of V carrying the 
message of s. Gs spans D, and V completes in at least ^ max„gy J2sgd (^) 
rounds. Let us show that this sum is in l7(|D|Z\mi„). For that purpose, for any 

5 CV, define c{G\S) as the number of connected components of G\S containing 
at least one node in D. Then let 5"* C U be such that 

c{G\S*)/\S*\=mm c{G\S)/\S\. 

It is shown in that c(G \ S'*)/|5'*| < Amin < 1 + c(G \ 5'*)/|5'*|. Let iF be 
any family of |D| subgraphs of G, each of them spanning D. We have 

^ ^deg^(z;)>|D|c(G\^*). 
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Therefore there exists v* € S* such that 

^ deg^(u*) > \D\ c{G\S*)/\S*\ > - 1). 

Therefore max^gy deg^^ (u) > |-D|(Z\min — 1) which completes the proof. 

□ 

Theorem 6. Let t(n) he the complexity of an {1,1) -approximation algorithm 
for the MOST problem in n-node graphs. There exists a t{n)-time 0(1)- 
approximation algorithm for the gossip problem. 

Again, by Lemma |21 t{n) can be chosen to be polynomial. 

Proof. Let G = {V, E) be any graph, and let D C V. Let T be a tree of degree 
Z\ that spans D in G. Let s be any node of D. Gossiping in D can be performed 
in two phases: Phase 1: Accumulation in s of all the messages of D; Phase 2: 
Broadcasting from s to I? of the \D\ messages accumulated during Phase 1. 
Phase 1 requires |0| — 1 rounds. By pipelining along the edges of a tree T of 
maximum degree A < Amind-l, Phase 2 can be performed in 0(|0|Z\nim) rounds. 
From LemmaQ this protocol is optimal within a constant multiplicative factor. 
(For more details, see [0]). □ 



5 Conclusion and Further Research 

Many problems remain open: (1) Can we approximate the optimal solution of the 
multicast problem in graphs within a constant factor? (2) Can we approximate 
the optimal solution of the multicast problem in digraphs within an 0(log \ D\) 
factor? (3) can we describe a polynomial approximation scheme for the gossip 
problem in graphs? In addition to these problems, we want to point out that, 
although approximation algorithms are usually preferred to heuristics, there are 
known heuristics for broadcasting and gossiping in the local model that perform 
very well in general. We therefore believe that the following heuristic for broad- 
casting under the vertex-disjoint paths mode should be worth to experiment: 

While \D\ > 1 do 

(1) Find the maximum number of vertex-disjoint paths connecting 
pairs of nodes in D; 

(2) For each of these paths do select one extremity of the path and 
remove the other from D; 

End. 

Note that Instruction (I) can be performed in polynomial time by using the 
algorithm in EH]. A broadcast protocol is obtained from this iteration by revers- 
ing the process, i.e., if is the set of nodes selected at the Ah iteration, and if 
Dk = {u} is the set of nodes remaining after the last iteration, then broadcasting 
from u consists of (1) u calls v, and (2) for i = k down to 1, every node in Di 
calls its “matched node” in Hi_i. 
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Abstract. We study the completion time of broadcast operations on 
Static Ad-Hoc Wireless Networks in presence of unpredictable and dy- 
namical faults. As for oblivious fault-tolerant distributed protocols, we 
provide an Q[Dn) lower bound where n is the number of nodes of the net- 
work and D is the source eccentricity in the fault-free part of the network. 
Rather surprisingly, this lower bound implies that the simple Round- 
Robin protocol, working in 0{Dn) time, is an optimal fault-tolerant 
oblivious protocol. Then, we demonstrate that networks of o(n/ log n) 
maximum in-degree admit faster oblivious protocols. Indeed, we derive 
an oblivious protocol having 0{D mm{n, A log n}) completion time on 
any network of maximum in-degree A. Finally, we address the question 
whether adaptive protocols can be faster than oblivious ones. We show 
that the answer is negative at least in the general setting: we indeed 
prove an Q(Dn) lower bound when D — This clearly implies 

that no {adaptive) protocol can achieve, in general, o{Dn) completion 
time. 



1 Introduction 

Static ad-hoc wireless networks (in short, wireless networks) have been the sub- 
ject of several works in recent years due to their potential applications in scenar- 
ios such as battlefields, emergency disaster relief, and in any situation in which 
it is very difficult (or impossible) to provide the necessary infrastructure jl DI 
ITT] As in other network models, a challenging task is to enable fast and reliable 
communication . 

A wireless network can be modeled as a directed graph G where an edge 
{u, v) exists if and only if u can communicate with v in one hop. Communica- 
tion between two stations that are not adjacent can be achieved by multi-hop 
transmissions. A useful (and sometimes unavoidable) paradigm of wireless com- 
munication is the structuring of communication into synchronous time-slots. This 
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paradigm is commonly adopted in the practical design of protocols and hence 
its use in theoretical analysis is well motivated |2EIIII2D|. In every time-slot, 
each active node may perform local computations and either transmit a message 
along all of its outgoing edges or try to recover messages from all its incoming 
edges (the last two operations are carried out by means of an omnidirectional 
antenna). This feature is extremely attractive in its broadcast nature: a single 
transmission by a node could be received by all its neighbors within one time- 
slot. However, since a single radio frequence is typically used, when two or more 
neighbors of a node are transmitting at the same time-slot, a collision occurs and 
the message is lost. So, a node can recover a message from one of its incoming 
edges if and only if this edge is the only one bringing in a message. 

One of the fundamental tasks in wireless network communication is the broad- 
cast operation. It consists in transmitting a message from one source node to all 
the nodes. Most of the proposed broadcast protocols in wireless networks con- 
cern the case in which the network is fault-free. However, wireless networks are 
typically adopted in scenarios where unpredictable node and link faults happen 
very frequently. 

Node failures happen when some hardware or software component of a station 
does not work, while link failures are due to the presence of a new (artificial or 
natural) hurdle that does not allow the communication along that link. Typically, 
while it is reasonable to assume that nodes know the initial topology of the 
network, they know nothing about the duration and the location (in the network) 
of the faults. Such faults may clearly happen at any instant, even during the 
execution of a protocol. In the sequel, such kind of faults will be called dynamical 
faults or, simply, faults. 

The (worst-case) completion-time of a fault-tolerant broadcasting protocol on 
a graph G is defined as the maximum number (over all possible fault patterns) 
of time-slots required to inform all nodes in the fault-free part of the network 
which are reachable from the source (a more formal definition will be given in 
Sect. Ol) . The aim of this paper is thus to investigate the completion time of 
such broadcast protocols in presence of dynamical-faults. 



1.1 Previous Results 

(Fault-free) broadcasting. We will mention only the best results which are 
presently known for the fault-free model. An 0{D -\- log^ n) upper bound on the 
completion time for n-node networks of source eccentricity D is proved in C2|. 
The source eccentricity is the maximum oriented distance (i.e. number of hops) 
from the source s to a reachable node of the network. Notice that I? is a trivial 
lower bound for the broadcast operation. 

In Pj, the authors give a protocol that completes broadcasting within time 
0{D\ogA\og{n/D)), where A denotes the maximal in-degree of the network. 
This bound cannot be improved in general: provides an l7(log^ n) lower bound 

that holds for graphs of maximal eccentricity D = 2. 

In 1^, the authors show that scheduling an optimal broadcast is NP-hard. 
The APX-hardness of the problem is proved in [J]. 
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Permanent-fault tolerant broadcasting. A node has a permanent fault if 
it never sends or receives messages since the beginning of the execution of the 
protocol. In im, the authors consider the broadcasting operation in presence 
of permanent unknown node faults for two restricted classes of networks: linear 
and square (or hexagonal) meshes. They consider both oblivious and adaptive 
protocols. In the former case, all transmissions are scheduled in advance; in 
particular, the action of a node in a given time-slot is independent of the messages 
received so far. In the latter case, nodes can decide their action also depending 
on the messages received so far. For both cases, the authors assume the existence 
of a bound t on the number of faults and, then, they derive a 0{D + t) bound 
for oblivious protocols and a 0(Z?-|-logmin{Z\, t}) bound for adaptive protocols, 
where, in this case (and in the sequel), D denotes the source eccentricity in the 
fault-free part of the network, i.e., the residual eccentricity. 

More recently, the issue of permanent-fault tolerant broadcasting on general 
networks has been studied in |4l,5l6l8j . Indeed, in these papers, several lower 
and upper bounds on the completion time of broadcasting are obtained on the 
unknown fault-free network model. A wireless network is said to be unknown 
when every node knows nothing about the network but its own label. Even 
though it has never been observed, it is easy to show that a broadcasting protocol 
for unknown fault-free networks is also a permanent-fault tolerant protocol for 
general (known) networks and viceversa. So, the results obtained in the unknown 
model immediately apply to the permanent-fault tolerance issue. In particular, 
one of the results in 0 can be interpreted as showing the existence of an infinite 
family of networks for which any permanent-fault tolerant protocol is forced to 
perform fI{n\ogD) time-slots to complete broadcast. The best general upper 
bound for permanent-fault tolerant protocols is O(nlog^n) |^. This protocol is 
thus almost optimal when D = I7(n“) for any constant a > 0. In |B|, the authors 
provide a permanent-fault tolerant protocol having 0{DA\o^ n) completion 
time on any network of maximum in-degree A. 

Other models. A different kind of fault-tolerant broadcasting is studied in HH: 
they in fact introduce fault-tolerant protocols that work under the assumption 
that all faults are eventually repaired. The protocols are not analyzed from the 
point of view of worst-case completion time. Finally, in HSI, the case in which 
broadcasting messages may be corrupted with some probability distribution is 
studied. 

Dynamical- fault tolerance broadcasting. We observe that permanent faults 
are special cases of dynamical faults and, moreover, we emphasize that all the 
above protocols do not work in presence of dynamical faults. This is mainly 
due to the collisions yielded by any unpredictable wake-up of a faulty node/link 
during the protocol execution. 

To the best of our knowledge, broadcasting on arbitrary wireless networks in 
presence of dynamical faults has never been studied before. 



1.2 Our Results 

Oblivious protocols. A simple oblivious (dynamical-) Fault-tolerant Distributed 
Broadcasting (FDB) protocol relies on the Round Robin scheduling: given an n- 
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node network G and a source node s, the protocol runs a sequence of consecutive 
identical phases; each phase consists of n time-slots and, during the i-th time 
slot {i = 1, . . . ,n), node i, if informed, acts as transmitter while all the other 
nodes work as receivers. It is not hard to show that, for any fault pattern yielding 
residual source eccentricity D, the Round Robin protocol completes broadcasting 
on G, after D phases (so, after Dn time-slots). One may think that this simple 
oblivious FDB protocol is not efficient (or, at least, not optimal) since it does 
never exploit simultaneous transmissions. 

Rather surprisingly, we show that, for any n and for any D < n, it is possible 
to define an n-node network G, such that any oblivious FDB protocol requires 
f2{Dn) time-slots to complete broadcast on G. It thus follows that the Round 
Robin protocol is optimal on general networks. The proof departs significantly 
from the techniques used in all previous related works (such as those used for the 
case of permanent-faults and based on selective families I4I8I ) : it in fact relies on 
a tight lower bound on the length of D-sequences (see Def. Q), a combinatorial 
tool that might have further applications in scheduling theory. 

We then show that a broad class of wireless networks admits an oblivious FDB 
protocol which is faster than the Round Robin protocol. Indeed, we exploit small 
ad-hoc strongly -selective families, a variant of strongly-selective families (also 
known as superimposed codes [1 1)^811 ,4) ). in order to develop an oblivious FDB 
protocol that completes broadcasting within 0{D min{n, A log n}) time-slots, 
where A is the maximum in-degree of the input network. This protocol is thus 
faster than the Round Robin protocol for all networks such that A = o{n/ log n) 
and it is almost optimal for constant A. 

Adaptive protocols. In adaptive FDB protocols, nodes have the ability to 
decide their own scheduling as a function of the messages received so far. A nat- 
ural and interesting question is whether adaptive FDB protocols are faster than 
oblivious ones. We give a partial negative answer to this question. We strengthen 
the connection between strong-selectivity and the task of fault-tolerant broad- 
casting: we indeed exploit the tight lower bound on the size of strongly-selective 
families, given in |S|, to derive an Q{Dn) lower bound for adaptive FDB pro- 
tocols, when D — 0{y/n). This implies that no (adaptive) FDB protocol can 
achieve o{Dn) completion time on arbitrary networks. 



1.3 Preliminaries 

The aim of this subsection is to formalize the concept of FDB protocol and its 
completion time. 

According to the fault-tolerance model adopted in the literature nms! , an 
FDB protocol for a graph G is a broadcasting protocol that, for any source s, 
and for any (node/link) fault pattern F, guarantees that every node, which is 
reachable from s in the residual subgraph G^, will receive the source message. 
A fault pattern F is a function that maps every time-slot t to the subset F{t) 
of nodes and links that are faulty at time slot t. The residual subgraph G^ is 



^ A node is informed during a time-slot t if it has received the source message in some 
time slot t' < t. 
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the graph obtained from G by removing all those nodes and links that belong to 
F{t), for some time-slot t during the execution of the protocol. The completion 
time of the protocol on a graph G and source s is the maximal (over all possible 
fault patterns) number of time-slots to perform the above task. 

This definition implies that nodes that are not reachable from the source 
in the residual subgraph are not considered in the analysis of the completion 
time of FDB protocols. We emphasize that any attempt to consider a larger 
residual subgraph makes the worst-case completion time of any FDB protocol 
unbounded. 

2 Oblivious Fault-Tolerant Protocols 

2.1 Lower Bound 

An oblivious protocol for an n-node network can be represented as a sequence 
S = {Si, S 2 , ■ ■ ■ , Si) of transmissions, where Si C [n] is the set of nodes that 
transmit during the z-th time-slot and I denotes the worst-case (w.r.t. all possible 
fault patterns) completion time. Wlog, we also assume that, if a node belongs 
to St (so it should transmit at time-slot t) but it has not received the source 
message during the first t—l time-slots, then it will send no message at time-slot 
t. 

In order to prove the lower bound, we consider the complete directed graphs 
Km n>l. We first show that any oblivious FDB protocol S on Kn must satisfy 
the following property 

Definition 1. A sequence S = {S\, S 2 , ■ ■ ■ , Si) of subsets of [n] is called a D- 
sequence for [n] if for each subset H of [n] with \H\ < D and each permutation 
7T = (tti, 7T2 . . . , 7T|^|) of H , there cxists a subscqucnce (S'^^ , , . . . , J o/S 

such that 

TTj G Si^ and Si^ C H, for 1 < j < \H\. 

Lemma 1. For every n and every D < n, if S is an oblivious protocol, which 
completes broadcast on the graph Kn for every fault pattern yielding a residual 
source eccentricity at most D, then S is a D-sequence for [n]. 

Proof. Let S = {Si, S 2 , ■ ■ ■ , Si) be an oblivious FDB protocol which completes 
broadcast on K„ for every fault pattern yielding a residual subgraph of source 
eccentricity at most D. Let us consider a subset FI of [n] with \H\ < D and a 
permutation tt = (tti, 7T2 . . . , t^\h\) of Ft. We will define a fault pattern F of K„ 
and a source node ttq such that 

1. TTo has residual eccentricity \Fl\; 

2. the fact that S completes broadcast for the pattern F implies the existence 
of a subsequence (S'i^, . . . , S'i|^|_i) of S such that G Si^ and Si^ C H, for 

Let us choose the source ttq as a node in [n] \ FI and consider the set of (directed) 
edges 

IJ (7Ti,7ri+i). 

0<i<\H\ 

The pattern F is defined as follows: 
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for any i > 1 and for any u € [n] with u yf the edge (m, tt^) is faulty 
at time slot t if and only if tt^-i ^ St or is not yet informed 
at that time slot. In other words, the edge (tt, tt^) is faulty whenever 
7Ti_i cannot inform tt^. 

Observe that the edges in A are never faulty. Moreover, it is easy to verify 
that the residual source eccentricity is \H\. Since, by definition, the protocol S 
completes the broadcast on any residual subgraph of source eccentricity at most 
D, then the protocol S completes the broadcast on the graph . So, for any 
i > 0, there is a time-slot in which gets informed. By definition of F, the 
only node that can inform is tt^. Since all nodes in [n] \ H are informed 
during the first time-slot in which the source ttq transmits, then can inform 
TTi+i at a time slot t only if St C {tti, 7T2 . . . , 7T|//|}. Thus, there must exist a 
subsequence {Si^, St^, . . . , of S such that tt^ G S'q- and Si^ C FI, for 

□ 

We now prove a tight lower bound on the length of a D-sequence. To this 
aim, we need the following technical result. 

Lemma 2. If S is a D-sequence for [n], then 

1^1 = ^(Dn). 
ses 

Proof. Let S = {Si, S 2 , ■ ■ ■ , Si) be a D-sequence for [n] and consider the sequence 
{ki, k 2 , ■ ■ ■ , kn-i) defined (by induction) as follows: 

ki = min{/i | [n] = Di<j<hSj}. 

By definition of D-sequence, ki must exist and so there exists (at least one) 
element in the set 

Ski \ 

Then, let tti be any of such elements. We now assume that the indices 
k\,k 2 , ■ ■ ■ ,ki and the elements tti, 7T2, . . . , are already defined, then 

ki+i = mm{h \ {[n] - {tti,. . . ,TTi}) C (J 5'^} 

ki<j<h 

(again, we notice that fci+i must exist since S is a D-sequence and i -I- 1 < D) 
and let TTj+i be any element in 

([n] - {7ri,...,7T*})P|(S'fci+i \ [j Sj). 

By definition of the above sequence, it holds that 

ki 

\Sj \ = n and \Sj \ > n — i, for any i = 1, . . . , D — 1. 

i = l ki<j<ki+i 
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It thus follows that 

fci D-2 D-2 

XI l-^l ^ Xl‘^jl + X X - n+'^{n-i) = n{Dn). 

SGS j = l i=l ki<j<ki+i i=l 



□ 



Lemma 3. If S is a D-sequence for [n] then |S| = fi{Dn). 

Proof. We count in two different ways the following number 

N= \{{H,{S,x)) I H C [n], Sg S, S C H smd x G S'}]. 

Let us consider H and the subsequence Sh of S obtained by deleting from S all 
the sets S such that S % H. By definition of 15-sequence Sh must form a 15- 
sequence for H. Hence, from LemmaEl we have at least cD\H\ ways of choosing 
(S,x), for a constant c > 0. Thus 



N> X cD\H\ = cDJ2[- 

HC[n] 



i=l 



i > cD V 

^=rti 



cDn 

i > 



(1) 



Now let {S, x) be fixed. There are 2" I'®! subsets H of [n] such that S C H. Thus 

M 

2lSI 



N = X 1*51 •2-1^1 = 2"XS^2"Xd = (2) 



sgs sgs ' 

Finally, by comparing Eq.sQandEl we derive 



Sgs 



2" cDn 

y|S| > ^2" 



so |S| > 



cDn 



We are now able to show the following 



Theorem 1. Let n > 0 and 1 < L> < n—1. For every oblivious FDB protocol on 
the graph K^, there exist a source s and a fault pattern F that force the protocol 
to perform f2{Dn) time-slots. 

Proof. Let S be any oblivious FDB protocol for and let T be the maximum 
completion-time of S over all possible residual subgraphs of source eccentricity 
D. Then, from LemmaEl the first T transmission sets of S must be a Z5-sequence 
for [n]. From LemmaEl it must hold that T = Q{Dn). 

□ 



Corollary 1. For any n > 0, any oblivious FDB protocol completes broadcasting 
on the graph in 12{n^) time-slots. 
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2.2 Efficient Protocols for Networks of “Small” In-Degree 

In this subsection, we show that networks of maximum in-degree A = oinj log n) 
admit oblivious FDB protocols which are faster than the Round-Robin one. To 
this aim, we need the following combinatorial tool. 

Definition 2. LetS andAf be families of sets. The family S is strongly-selective 
for Af if for every set N G Af and for every element x G N there exists a set 
S G S such that N C\ S = {x}. 

Strongly-selective families for the family of all subsets of [n] having size at 
most A have been recently used to develop multi-broadcast protocols on the 
unknown model m- The following protocol instead uses strong selectivity for 
the family Af consisting of the sets of in-neighbors of the nodes of the input 
network G. In fact, for each node v of G, let N{v) C [n] be the set of its in- 
neighbors and let 

Af={N{v) \vGV}. 

Let S = {Si, S 2 , . . . , Sm} be any (arbitrarily ordered) strongly-selective family 
for Af. 

Description of Protocol broad. The protocol consists of a sequence of phases. 

- In the first phase the source sends its message. 

- All successive phases are identical and each of them consists of m time- 

slots. At time-slot j of every phase, any informed node v sends the 
source message if and only if it belongs to Sj-, All the remaining nodes 
act as receivers. 

Lemma 4. For any (dynamical) fault pattern F, at the end of phase i, every 
node at distance i, from the source s in the residual subgraph G^ , is informed. 
So, BROAD completes broadcasting within Dm time-slots, where the D is the 
residual source eccentricity. 

Proof. The proof is by induction on the distance i. For * = 1 it is obvious. We 
thus assume that all nodes at distance i have received the source message during 
the first i phases. Consider a node v at distance i -|- 1 in the residual subgraph 
and a node u G N{v) at distance i in the residual subgraph. Notice that, since 
V is at “residual” distance i -\- 1 from s, such an u must exist. Moreover, N{v) 
belongs to Af and S is strongly-selective for Af, so there will be a time-slot in 
phase i -|- 1 in which only u (among the nodes in N{v)) transmits the source 
message and v will successfully receive it. 

It is now clear that the total number of time-slots required by the protocol to 
complete the broadcast is Dm. 

□ 

The above lemma motivates our interest in finding strongly-selective families 
of small size since the latter is a factor of the completion time of our protocol. 

The probabilistic construction of strongly-selective families in the proof of 
the next lemma can be efficiently (i.e. in polynomial time) de-randomized by 
means of a suitable application of the method of conditional probabilities uni- 
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This technique has been recently applied 0 to a weaker version of selectivity. 
Furthermore, as for strong selectivity, we can also use the deterministic efficient 
construction of superimposed eodes given in m that yield strongly-selective 
families of size equivalent to that in our lemma. 

Lemma 5. For any family Af of sets, eaeh of size at most A, there exists a 
strongly seleetive family S for Af sueh that |5| = 0(Z\max{log |A/”|, log Z\}). 

Proof. We assume, without loss of generality, that the sets in Af are subsets of 
the ground set [n] and that A > 2 (for A = I the family S = {[n]} trivially 
proves the lemma). 

We use a probabilistic argument: construct a set S by picking every element of 
[n] with probability For fixed N G Af and x G N it holds that: 



Pr[7VnS' = {a;}] 



1 






( 3 ) 



(where the last inequality holds since A > 2). Consider now a family S = 
{S'!, S 2 , ■ . . , S'm} where each set Si is constructed, independently, as above. From 
Ineq ( 0 , it follows that the probability that S is not strongly-selective for fixed 
N G Af and a; G is at most 



^ 4A 



< e 4^ 



(the above bound follows from the well-known inequality 1 — t < e ‘ that holds 
for any real t). It thus follows that 

Pr[ S is not strongly-selective for Af] < EEe- < \Af\Ae~^ 

x£N 

and the last value is less than 1 for m > 8Z\ max(log | A/”| , log A) . 

□ 



Theorem 2. The oblivious FDB protocol broad completes broadcast within 
0{D min{Z\ log n, n}) time-slots on any n-node graph G with maximum in-degree 

A. 

Proof. Since the graph has maximum in-degree A, the size of any subset in Af 
is at most A. Hence, from Lemma 0, there exists a strongly-selective family S 
for Af of size |5| < min{cZ\ log n, n}, for some constant c > 0 (the bound n is 
due to the fact that a family of n singletons is always strongly-selective). The 
theorem is thus an easy consequence of the above bound and Lemma 0 

□ 



3 Adaptive Fault-Tolerant Protocols 

In this section, a lower bound on the completion-time of adaptive FDB protocols 
is given. To this aim, we consider strongly-selective family for the family of all 
subsets of [n] having size at most A (in short, (n. A) -strongly-selective families). 
As for the size of such families, the following lower bound is known. 
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Theorem 3. |^. If S is an (n, A)-strongly-selective family then it holds that 

/ r 

liSI = 12 ( min < log n, n 

V I log /i 

We will adopt the general definition of (adaptive) distributed broadcast pro- 
tocols introduced in |2|. In such protocols, the action of a node in a specific 
time-slot is a function of its own label, the number of the current time-slot t, 
the input graph G, and the messages received during the previous time-slots. 

Theorem 4. For any n > 0, any FDB protocol, that completes broadcasting on 
the graph K^, requires f2{ny/n) time-slots. 

Sketch of the proof. Given any protocol P, we define two sets of faults: a set of 
edges of that suffer a permanent fault (i.e., they will be forever down since 
the beginning of the protocol) and a set of dynamical node faults. The permanent 
faults are chosen in order to yield a layered graph which consists of H -I- 1 
levels Lq, Li,. . . ,Ld where D = [i/n/2j . Level Lq contains only the source s, 
level Lj, j < D, consists of at most A nodes with A — ^Jn and Ljo consists of 
all the remaining nodes. All nodes of Lj-i in G^ have (only) outgoing edges to 
all nodes in Ly 

Both permanent and dynamical faults will be determined depending on the 
actions of P. As for permanent faults, instead of describing the set of faulty 
edges, we provide (in the proof of the next claim), for any j = 1, . . . , D, the 
set of nodes that belongs to Lj . This permanent fault pattern will be combined 
with the dynamical fault pattern (which is described below) in such a way that 
the protocol is forced to execute l7(j^^^logn) = I7(n) time-slots in order to 
successfully transmit the initial message between two consecutive levels. 

From Thm.13 there exists a constant c > 0 such that, any (|"n/2] , Z\)-strongly- 
selective family must have size at least T, where T > cn. The theorem is thus 
an easy consequence of the following 

Claim For any j > 0, there exists a node assignment to Lj and a 
pattern of dynamical node faults in the first j levels such that P 
does not broadcast the source message to level Lj-i-i before the time- 
slot jT. 

Proof. The proof is by induction on j. For j = 0, the claim is trivial. 

We thus assume the thesis be true for j — 1. Let us define 

R = {node not already assigned to levels Lq, . . . ,Lj_i}. 

Notice that \R\ > \n/2\ . Let L be an arbitrary subset of R. Consider 
the following two cases: i) Lj is chosen as L; ii) Lj is chosen as R 
(i.e., all the remaining nodes are assigned to the j -\- 1-th level). In 
both cases, the predecessoiQ subgraph G^ of any node u G L is that 
induced by LqUAiU. . . Lj_iU|u} in G^ . It follows that the behavior 

^ Given a graph G, the predecessor subgraph Gu of a node u is the subgraph of G 
induced by all nodes v for which there exists a directed path from v to u. 
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of node u, according to protocol P, is the same in both cases. We can 
thus consider the behavior of P when Lj = R. Then, we define 

St = {u £ R \ u acts as transmitter at time-slot {j — 1)T -|- t}. 

and the family S = {S'!, , S'r-i} of subsets from R. Since |5| < T, 

S is not ([n/2] , Z\)-strongly-selective; so, a subset L C R exists such 
that \L\ < A and L is not strongly selected by S (and thus by P) 
in any time-slot t such that (j — 1)T-|-1 < t < jT — 1. Let thus u 
be a node in L which is not selected. Then, the proof is completed 
by considering a suitable pattern of dynamical faults in such a way 
that: i) all the outgoing edges of u are always fault-free, and ii) no 
node (in particular, the nodes different from u) in L will successfully 
transmit during those T — 1 time slots. 

□ 

Since the residual graph yielded by the above proof has residual eccentricity 
0{^/n), it also follows that 

Corollary 2. No FOB protocol can achieve an o{Dn) completion time on gen- 
eral n-node networks. 

Acknowledgement. A significant credit goes to Paolo Penna for helpful dis- 
cussions. More importantly, Paolo suggested us to investigate the issue of fault- 
tolerance in wireless networks. 
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